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Abstract. It is known that the composition of schema mappings, each specified by 
source-to-target tgds (st-tgds), can be specified by a second-order tgd (SO tgd). Wc 
consider the question of what happens when target constraints are allowed. Specifically, 
we consider the question of specifying the composition of standard schema mappings (those 
specified by st-tgds, target egds, and a weakly acyclic set of target tgds). We show that 
SO tgds, even with the assistance of arbitrary source constraints and target constraints, 
cannot specify in general the composition of two standard schema mappings. Therefore, 
we introduce source-to-target second-order dependencies (st-SO dependencies), which are 
similar to SO tgds, but allow equations in the conclusion. We show that st-SO dependen- 
cies (along with target egds and target tgds) are sufficient to express the composition of 
every finite sequence of standard schema mappings, and further, every st-SO dependency 
specifies such a composition. In addition to this expressive power, we show that st-SO 
dependencies enjoy other desirable properties. In particular, they have a polynomial-time 
chase that generates a universal solution. This universal solution can be used to find the 
certain answers to unions of conjunctive queries in polynomial time. 

It is easy to show that the composition of an arbitrary number of standard schema 
mappings is equivalent to the composition of only two standard schema mappings. We 
show that surprisingly, the analogous result holds also for schema mappings specified by 
just st-tgds (no target constraints). That is, the composition of an arbitrary number of 
such schema mappings is equivalent to the composition of only two such schema mappings. 
This is proven by showing that every SO tgd is equivalent to an unnested SO tgd (one 
where there is no nesting of function symbols). The language of unnested SO tgds is quite 
natural, and we show that unnested SO tgds are capable of specifying the composition of 
an arbitrary number of schema mappings, each specified by st-tgds. Similarly, we prove 
unnesting results for st-SO dependencies, with the same types of consequences. 
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1. Introduction 

Schema mappings are high-level specifications that describe the relationship between two 
database schemas, a source schema and a target schema. Because of the crucial importance 
of schema mappings for data integration and data exchange (see the surveys [351 ES])j 
several different operators on schema mappings have been singled out as important objects 
of study [9]. One of the most fundamental is the composition operator, which combines 
successive schema mappings into a single schema mapping. The composition operator can 
play a useful role each time the target of a schema mapping is also the source of another 
schema mapping. This scenario occurs, for instance, in schema evolution, where a schema 
may undergo several successive changes. It also occurs in extract-transform-load (ETL) 
processes in which the output of a transformation may be the input to another |45j . The 
composition operator has been studied in depth [231 Ell ED H2]- 

One of the most basic questions is: what is the language needed to express the compo- 
sition of schema mappings? For example, if the schema mapping M12 is an st-tgd mapping, 
that is, a mapping specified by a finite set of the widely-studied source-to-target tuple- 
generating dependencies (st-tgds), and the schema mapping .M23 is also an st-tgd mapping, 
is the composition A^i2 AI23 also an st-tgd mapping? Fagin et al. [23 j showed that sur- 
prisingly, the answer is "No." In fact, they showed that it is necessary to pass to existential 
second-order logic to express this composition in general. Specifically, they defined a class 
of dependencies, which they call second-order tgds (SO tgds), which are source-to-target, 
with existentially-quantified function symbols, and they showed that this is the "language 
of composition" . That is, they showed that the composition of any number of st-tgd map- 
pings can be specified by an SO tgd. They also showed that every SO tgd specifies the 
composition of a finite number of st-tgd mappings. Thus, SO tgds are exactly the right 
language. 

What happens if we allow not only source-to-target constraints, but also target con- 
straints? Target constraints are important in practice; examples of important target con- 
straints are those that specify the keys of target relations, and referential integrity con- 
straints (or inclusion dependencies [ID]). This paper is motivated by the question of how 
to express the compositions of schema mappings that have target constraints. This ques- 
tion was first explored by Nash et al. 02] , where an even more general class of constraints 
was studied: constraints expressed over the joint source and target schemas without any 
restrictions. Here we study a case intermediate between that studied by Fagin et al. in 
[23] and that studied by Nash et al. in [32]. Specifically, we study standard schema map- 
pings, where the source-to-target constraints are st-tgds, and the target constraints consist 
of target equality-generating dependencies (t-egds) and a weakly acyclic set [22] of target 
tuple-generating dependencies (t-tgds). Standard schema mappings have a chase that is 
guaranteed to terminate in polynomial time. In fact, weak acyclicity was introduced in [22] 
in order to provide a fairly general sufficient condition for the chase to terminate in polyno- 
mial time (a slightly less general class was introduced in [16], under the name constraints 
with stratified witness, for the same purpose). 

Standard schema mappings are a natural "sweet spot" between the schema mappings 
studied by Fagin et al. [22] (with only source-to-target constraints) and the schema mappings 
studied by Nash et al. [32] (with general constraints), for two reasons. The first reason is 
the importance of standard schema mappings. Source-to-target tgds are the natural and 
common backbone language of data exchange systems [20] . Furthermore, even though the 
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notion of weakly acyclic sets of tgds was introduced only recently, it has now been studied 

extensively pa m si bi izi isi nn H2i us us I2H H2i ubi ibdi isn ssi iss ssi i^a 1^1 

H3] . Among the important special cases of weakly acyclic sets of tgds are sets of full tgds 
(those with no existential quantifiers) and acyclic sets of inclusion dependencies |13] , a large 
class that is common in practice. The second reason for our interest in standard schema 
mappings is that as we shall see, compositions of standard schema mappings have especially 
nice properties. Thus, the language of standard schema mappings is expressive enough to 
be useful in practice, and yet simple enough to allow nice properties, such as having a 
polynomial-time chase. 

There are various inexpressibility results in [23] and [32] that show the inability of first- 
order logic to express compositions. Thus, each of these results says that there is a pair of 
schema mappings that are each specified by simple formulas in first-order logic, but where 
the composition cannot be expressed in first-order logic. In this paper, we show that some 
compositions cannot be expressed even in certain fragments of second-order logic. First, 
we show that SO tgds are not adequate to express the composition of an arbitrary pair of 
standard schema mappings. It turns out that this is quite easy to show. But what if we 
allow not only SO tgds, but also arbitrary source constraints and target constraints? This 
is a more delicate problem. By making use of a notion of locality from [5], we show that 
even these are not adequate to express the composition of an arbitrary pair of standard 
schema mappings. 

Therefore, we introduce a richer class of dependencies, which we call source-to-target 
second-order dependencies (st-SO dependencies). This class of dependencies is the source- 
to-target restriction of the class SkVCQ = of dependencies introduced in [42]. Our st-SO 
dependencies differ from SO tgds in that st-SO dependencies may have not only relational 
atomic formulas R(t±, . . . ,t n ) in the conclusions, but also equalities t\ = t<i- We show 
that st-SO dependencies are exactly the right extension of SO tgds for the purpose of 
expressing the composition of standard schema mappings. Specifically, we show that (1) 
the composition of standard schema mappings can be expressed by an st-SO dependency 
(along with target constraints), and (2) every st-SO dependency specifies the composition of 
some finite sequence of standard schema mappings. We note that a result analogous to (1), 
but for schema mappings that are not necessarily source-to-target, was obtained in [42] by 
using their class SkVCQ = of dependencies. In fact, our proof of (1) is simply a variation of 
the proof in [J2] . 

In addition, we show that st-SO dependencies enjoy other desirable properties. In 
particular, we show that they have a polynomial-time chase procedure. This chase procedure 
is novel, in that it has to keep track of constantly changing values of functions. As usual, 
the chase generates not just a solution, but a universal solution [22]. (Recall that a solution 
for a source instance I with respect to a schema mapping A4 is a target instance J where 
the pair (I, J) satisfies the constraints of A4, and a universal solution is a solution with 
a homomorphism to every solution.) The fact that the chase is guaranteed to terminate 
(whether in polynomial time or otherwise) implies that if there is a solution for a given source 
instance /, then there is a universal solution. The fact that the chase runs in polynomial 
time guarantees that there is a polynomial-time algorithm for deciding if there is a solution, 
and, if so, for producing a universal solution. 

Let q be a query posed against the target schema. The certain answers for q on a source 
instance /, with respect to a schema mapping A4, are those tuples that appear in the answer 
q{J) for every solution J for /. It is shown in [22] that if q is a union of conjunctive queries, 
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and J* is a universal solution for /, then the certain answers for q on / can be obtained 
by evaluating q on J* and then keeping only those tuples formed entirely of values from 
/. Since the chase using an st-SO dependency can be carried out in polynomial time, it 
follows that we can obtain a universal solution in polynomial time, and so we can compute 
the certain answers to unions of conjunctive queries in polynomial time. 

In addition to our results about st-SO dependencies, we also have some results directly 
about compositions of schema mappings. It is easy to show that the composition of an 
arbitrary number of standard schema mappings is equivalent to the composition of only 
two standard schema mappings. We show the surprising result that a similar result holds 
also for st-tgd mappings (no target constraints). That is, the composition of an arbitrary 
number of st-tgd mappings is equivalent to the composition of only two st-tgd mappings. 
This is proven by showing that every SO tgd is equivalent to an unnested SO tgd (one 
where there is no nesting of function symbols). We also prove a similar denesting result for 
st-SO dependencies. These denesting results are the most difficult results technically in the 
paper. 

We feel that unnested dependencies are more natural, more readable, and easier to 
understand than nested dependencies. They are probably easier to use in practice. For 
example, it is easy to see that the "nested mappings" in [25] can be expressed by unnested SO 
tgds. We show that unnested SO tgds are also expressive enough to specify the composition 
of an arbitrary number of st-tgd mappings. This was not known even for the composition 
of two st-tgd mappings. Thus, although it was shown in [23] that each unnested SO tgd 
specifies the composition of a pair of st-tgd mappings, the converse was not shown. In 
fact, for the composition of two st-tgd mappings, the composition construction in |23] can 
produce an SO tgd with nesting depth 2, not 1. 

We close by discussing an application of our results. In practice, a composition of 
many schema mappings may arise (say, as the result of many steps of schema evolution). 
If these are st-tgd mappings, then there are several approaches towards "simplifying" this 
composition. One approach is to replace the composition of many st-tgd mappings by 
a single schema mapping, specified by an unnested SO tgd. For another approach, we 
can remain within the language of st-tgds by replacing the composition of many st-tgd 
mappings by the composition of only two st-tgd mappings. A similar comment applies to 
the composition of many standard schema mappings. 

2. Preliminaries 

A schema R is a finite set {R\, ■ ■ ■ , Rk} of relation symbols, with each R{ having a fixed 
arity rii > 0. Let D be a countably infinite domain. An instance I of R assigns to each 
relation symbol R{ of R a finite nj-ary relation Rj C D n \ We let Inst(R) be the set 
of instances of R. The domain (or active domain) dom(I) of instance / is the set of all 
elements that occur in any of the relations R{. We say that R{a\, . . . , a n ) is a fact of I if 
(ai, . . . , a n ) G R 1 . We sometimes denote an instance by its set of facts. 

As is customary in the data exchange literature, we consider instances with two types 
of values: constants and nulls [22] , More precisely, let C and N be infinite and disjoint sets 
of constants and nulls, respectively, and take the domain D to be C U N. If we refer to a 
schema S as a source schema, then we assume that for every instance / of S, it holds that 
dom(/) C C. On the other hand, if we refer to a schema T as a target schema, then for 
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every instance J of S, it holds that dom( J) C C U N. The distinction between constants 
and nulls is important in the definition of a homomorphism (which we give later). 

2.1. Source-to-target and target dependencies. Fix a source schema S and a target 
schema T, and assume that S and T do not have predicate symbols in common. Then a 
source-to-target tuple- generating dependency (st-tgd) is a first-order sentence of the form: 

\/x(tp(x) -)■ 3y^(x,y)), 

where (p(x) is a conjunction of relational atoms over S and vjj(x,y) is a conjunction of 
relational atoms over T. We assume a safety condition, that every member of x actually 
appears in a relational atom in (f(x). A target equality- generating dependency (t-egd) is a 
first-order sentence of the form: 

Vx (<f(x) — y u = v), 

where <p(x) is a conjunction of relational atoms over T and u, v are among the variables 
mentioned in x. We again assume the same safety condition. In several of the examples 
we give in this paper, we shall make use of special t-egds called key dependencies, which 
say that one attribute of a binary relation is a key for that relation (of course, we could 
define more general key dependencies if we wanted). The key dependencies we consider are 
either of the form R{x, y) A R{x, z) — )• y = z (which says that the first attribute is a key) or 
S(y,x) A S(z,x) — > y = z (which says that the second attribute is a key). Finally, a target 
tuple- generating dependency (t-tgd) is a first-order sentence of the form: 

Vx(tp(x) -> 3yi)(x,y)), 

where both <p(x) and ip(x,y) are conjunctions of relational atoms over T, and where we 
again assume the same safety condition. 

The notion of satisfaction of a t-egd a by a target instance J, denoted by J \= a, is 
defined as the standard notion of satisfaction in first-order logic, and likewise for t-tgds. 
For the case of an st-tgd a, a source instance I and a target instance J, the pair (I, J) is 
said to satisfy a, denoted by (I, J) |= a, if the following instance K of S U T satisfies a in 
the standard first-order logic sense. For every relation symbol S € S, relation S K is defined 
as S 1 , and for every relation symbol T £ T, relation T K is defined as T J . As usual, a set 
T, st of st-tgds is said to be satisfied by a pair (I, J), denoted by (I, J) \= E s t, if (I, J) \= a 
for every a € T, st (and likewise for a set of t-egds and t-tgds). 

2.2. Schema mappings. In general, a schema mapping from a source schema S to a target 
schema T is a set of pairs (I, J), where I is an instance of S and J is an instance of T. In 
this paper, we restrict our attention to some classes of schema mappings that are specified 
in some logical formalisms. We may sometimes refer to two schema mappings with the 
same set of (/, J) pairs as equivalent, to capture the idea that the formulas that specify 
them are logically equivalent. A schema mapping M from S to T is said to be an st-tgd 
mapping if there exists a finite set T, st of st-tgds such that (I, J) belongs to M. if and only 
if (I, J) \= Egt, for every pair I, J of instances of S and T, respectively. We use notation 
M = (S,T, Est) t° indicate that Ai is specified by E s t- Moreover, a schema mapping A4 
from S to T is said to be a standard schema mapping if there exists a finite set E s t of st-tgds 
and a finite set Ej consisting of a set of t-egds and a weakly acyclic set of t-tgds, such that 
(I, J) belongs to M. if and only if (I, J) |= T, st and J |= Et, for every pair /, J of instances 
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of S and T, respectively; notation M = (S, T, E s t, St) is used in this case to indicate that 
M is specified by S s j and E^. We occasionally allow a finite set E s of source constraints in 
some of our schema mappings: we then use the notation M. = (S, T, E s , T, st , T, t ). 

To define the widely used notion of weak acyclicity, we need to introduce some termi- 
nology. For a set T of t-tgds over T, define the dependency graph Gr of T as follows. 

• For every relation name T in T of arity n, and for every i € {1, . . . ,n}, include a node 
(T,i) in G r . 

• Include an edge (T\,i) — > (T2, j) in Gr if there exist a t-tgd \/x(ip(x) — > 3ytp(x,y)) in T 
and a variable x in x such that, x occurs in the i-th attribute of T\ in a conjunct of ip 
and in the j-th attribute of T2 in a conjunct of ip. 

• Include a special edge (Ti,i) — >•* (T2, j) in Gr if there exist a t-tgd \/x((p(x) — > 3yip(x,y)) 
in r and variables x, y in x and y, respectively, such that x occurs in the i-th attribute 
of T\ in a conjunct of ip and y occurs in the j'-th attribute of T2 in a conjunct of -0. 

Then set V of t-tgds is said to be weakly acyclic if its dependency graph Gr has no cycle 
through a special edge [22] , 

Given a schema mapping .M, if a pair (J, J) belongs to it, then J is said to be a solution 
for / with respect to Ai. A universal solution |22| for / is a solution with a homomorphism 
to every solution for /. A homomorphism from instance J\ to instance J2 is a function /i 
from C UN to C UN such that (1) for each c in C, we have that h(c) = c, and (2) whenever 
R(ai, . . . , a n ) is a fact of Ji, then R(h{a{), . . . , h(a n )) is a fact of J2. 

2.3. Second-order dependencies. In this paper, we also consider schema mappings that 
are specified by second-order dependencies. In the definition of these dependencies, the 
following terminology is used. Given a collection x of variables and a collection / of function 
symbols, a term (based on x and f) with depth of nesting d is defined recursively as follows: 

(1) Every member of x and every 0-ary function symbol (constant symbol) of / is a term 
with depth of nesting 0. 

(2) If / is a fe-ary function symbol in / with k > 1, and if ti, . . . , are terms, with maximum 
depth of nesting d — 1, then f(ti, . . . , £&) is a term with depth of nesting d. 

Definition 2.1. (|23j) Given a source schema S and a target schema T, a second-order 
source-to-target tuple- generating dependency or SO tgd (from S to T) is a second-order 
formula of the form 

3/(Vxi(<pi ipi) A ••■ A Mx n {ip n -> Vn)), 

where 

(1) Each member of / is a function symbol. 

(2) Each (pi is a conjunction of 

• relational atomic formulas of the form S(y% , . . . , y^) , where S is a fe-ary relation symbol 
of S and y±, . . ., y^ are (not necessarily distinct) variables in Xj, and 

• equality atoms of the form t = t' , where t and £' are terms based on X{ and /. 

(3) Each ipi is a conjunction of relational atomic formulas of the form T(t±, . . . ,tg), where 
T is an £-ary relation symbol of T and t\, . . . , tg are terms based on Xj and /. 

(4) Each variable in Xj appears in some relational atomic formula of (pi. □ 
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The fourth condition is the safety condition for SO tgds. Note that it is "built into" 
SO tgds that they are source-to-target. The depth of nesting of an SO tgd is the maximal 
depth of nesting of the terms that appear in it. We say that the SO tgd is unnested if its 
depth of nesting is at most 1. Thus, an unnested SO tgd can contain terms like f(x), but 
not terms like f(g(x)). 

As was noted in [23\ I42| . there is a subtlety in the semantics of SO tgds, namely, the 
semantics of existentially quantified function symbols. In particular, in deciding whether 
(I, J) |= a, for an SO tgd a, what should the domain and range of the functions instantiating 
the existentially quantified function symbols be? The obvious choice is to let the domain 
and range be the active domain of (I, J), but it is shown in [23^ EL2] that this does not work 
properly. Instead, the solution in |23^ W2\ is as follows. Let a be an SO tgd from a source 
schema S to a target schema T. Then given an instance I of S and an instance J of T, 
instance (I, J) is converted into a structure (U;I,J), which is just like (I, J) except that 
it has a universe U. The domain and range of the functions in a is then taken to be U. 
The universe U is taken to be a countably infinite set that includes dom(J) U dom( J). The 
intuition is that the universe contains the active domain along with an infinite set of nulls. 
Then (I, J) is said to satisfy a, denoted by (/, J) |= a, if (U;I,J) \= a under the standard 
notion of satisfaction in second-order logic (see, for example, [IE]). It should be noticed 
that it is proven in [23] that in the case of SO tgds, instead of taking the universe U to be 
infinite, one can take it to be finite and "sufficiently large" , whereas in [32] this is shown to 
be insufficient in the presence of unrestricted target constraints. 

The class of SO tgds was introduced in [23] to deal with the problem of composing 
schema mappings. More specifically, given a schema mapping M\2 from a schema Si to 
a schema S2 and a schema mapping M23 from S2 to a schema S3, the composition of 
these two schemas, denoted by M12 o .M23, is defined as the schema mapping consisting of 
all pairs (Ji,^) of instances for which there exists an instance I2 of S2 such that (/]_, Jg) 
belong to M12 and (12,13) belong to ^23- It was shown in [23] that the composition of 
an arbitrary number of st-tgd mappings is specified by an SO tgd, that SO tgds are closed 
under composition, and that every SO tgd specifies the composition of a finite number of 
st-tgd mappings. 

3. A Negative Result: SO tgds are not Enough 

As pointed out in the previous section, SO tgds were introduced in [23] to deal with the 
problem of composing schema mappings. Thus, SO tgds are a natural starting point for the 
study of languages for defining the composition of schema mappings with target constraints, 
which is the goal of this paper. Unfortunately, it can be easily proved that this language 
is not rich enough to be able to specify the composition of some simple schema mappings 
with target constraints. We now give an example. 

Example 3.1. Let A^i 2 = (Si, S 2l £12, £ 2 ) and A^ 23 = (S 2 , S 3 , S 23 ), where Si = {P(-, •)}, 
S 2 = {#(•,•)}, S 3 = {r( v )} and 

£12 = {P(x,y) -+R(x,y)}, 

£2 = {R(x,y) AR(x,z) -)• y = z}, 

£23 = {R{x,y) -> T(x,y)}. 

Notice that £2 consists of a key dependency over S 2 . 
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Let M13 = M12 ° -M-23- We now show that M13 cannot be specified by an SO tgd 
0. Assume that it were; we shall derive a contradiction. If I\ is an arbitrary instance of 
Si, then there is I3 such that £ M13 (for example, we could take ^3 to be the 

result of chasing I\ with as in |23j). However, let I\ be the instance of Si such that 
P h = {(1,2), (1,3)}. Then there is no instance I3 of S 3 such that (h,h) € Mu o .M23, 
since I\ does not have any solutions with respect to M\2- D 

From the previous example, we obtain the following proposition. 

Proposition 3.1. There exist schema mappings M.yi = (Si, S2, £12, £2) o-nd M.23 — 
(82,83,^23), where £12 and £23 are sets of st-tgds and £2 is a set of key dependencies, 
such that M.12 M-23 cannot be specified by an SO tgd. 

Proposition 13.11 does not rule out the possibility that the composition of .M12 and .M23 
can be specified by using an SO tgd together with some source and target constraints. In 
fact, if A^i2 and .M23 are as in Example l3.1[ then the composition .Mi2°.M23 can be specified 
by a set of st-tgds together with some source constraints: specifically, .M12 M?a = 
where .M13 = (Si, S3, Si, £13) and 



A natural question is then whether the language of SO tgds together with source and target 
constraints is the right language for defining the composition of schema mappings with 
source and target constraints. Unfortunately, the following theorem shows that this is not 
the case. 

Theorem 3.2. There exist schema mappings M12 = (Si, S2, £12, £2) an< ^ -^23 = (S2, S3, 
£23), where £12 and £23 ore sets of st-tgds and £2 is a set of key dependencies, such that 
M.\2 A^23 cannot be specified by any schema mapping of the form (Si, S3, 01, 013, 03), 
where a\ is an arbitrary source constraint, 013 is an SO tgd, and 03 is an arbitrary target 
constraint. 

If we view a source constraint as a set of allowed source instances, then when we say 
that 01 is an "arbitrary source constraint" in Theorem 13.21 we mean that 01 allows an 
arbitrary set of source instances. A similar comment applies to 03 being an "arbitrary 
target constraint". 

To prove this theorem, we use a notion of locality from [5]. Notions of locality [24|, [271 
|32"1 [37] have been widely used to prove inexpressibility results for first-order logic (FO) and 
some of its extensions. The intuition underlying those notions of locality is that FO cannot 
express properties (such as connectivity, cyclicity, etc.) that involve nontrivial recursive 
computations. The setting of locality is as follows. The Gaifman graph Q{I) of an instance 
/ of a schema S is the graph whose nodes are the elements of dom(I), and such that there 
exists an edge between a and b in Q{I) if and only if a and b belong to the same tuple of a 
relation R 1 , for some R € S. For example, if I is an undirected graph, then Q{I) is / itself. 
The distance between two elements a and b in / is considered to be the distance between 
them in G(I). Given a € dom(7), the instance N^(a), called the d-neighborhood of a in I, is 
defined as the restriction of / to the elements at distance at most d from a, with a treated 
as a distinguished element (a constant in the vocabulary). 



£1 

£13 



{P(x,y) AP(x,z) -^y = z} 
{P(x,y)^T(x,y)}. 
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The notion of neighborhood of a point is used in [5] to introduce a notion of locality 
for data transformations. Before we give this definition, we give the standard recursive 
definition of the quantifier rank qr(cf)) of an FO-formula (j). 

• If is quantifier- free, then qr(ip) = 0. 

• qr(^ip) = qr(ip) 

• qr((pi A <p 2 ) = max{gr((^i), qr(ip 2 )} 

• qr(\/xip) = 1 + qr{tp) 

Following [5], we write Nl(a) = k N^(b) to mean that iVj(a) and 7V|(6) agree on all FO- 
sentences of quantifier rank at most k; that is, for every FO-sentence <~p of quantifier rank 
at most k, we have that Nt(a) \= (p if and only if N^{b) \= <p. 

Definition 3.3. ([5]) Given a source schema S and a target schema T, a mapping $ : S 
T is locally consistent under FO- equivalence if for every r, I > there exist d, fc > such 
that, for every instance L of S and a,b € dom(I), if N^(a) =f. N^(b), then 
(1) a € dom(5"(-f)) if and only if b £ dom(#(/)), and 



For a fixed schema mapping (S,T,S S f), we denote by Suniv the transformation from 
S to T, such that 3univ(-0 is the canonical universal solution for /, which is obtained by 
doing a naive chase of / with E s j. 

Proposition 3.4 ([5]). For every st-tgd mapping, the transformation 5univ is locally con- 
sistent under FO -equivalence. 

The previous proposition can be easily extended to the case of a composition of a finite 
number of st-tgd mappings. 

Lemma 3.5. Let n>2. For every i € [l,n — 1], let Mi = (Sj, Sj+i, Sj j+i) 6e a schema 
mapping specified by a set T,a + i of st-tgds, and ^ niv 6e i/ie canonical universal solution 
transformation for M.i. Assume that $ is the transformation from Si to S n defined as: 



for every instance I\ of Si. Then $ is locally consistent under FO -equivalence. 

Lemma 13.51 is one of the key components in the proof of Theorem 13.21 We shall also 
utilize the next proposition (Proposition I3.6|) in the proof of Theorem 13.21 Proposition 13.61 
in a generalization of Proposition 7.2 of [19], which says that for the composition of two 
st-tgd mappings, the "chase of the chase" is a universal solution. 

Proposition 3.6. Let Mi, . . . ,Mk be schema mappings, each specified by st-tgds, target 
egds, and target tgds. Let M = Mi o • • • o Mk, and let L be a source instance for Mi- Let 
U be a result (if it exists) of chasing L with Mi, then chasing the result with M 2 , and 
then chasing the result with Mk- Then U is a universal solution for I with respect to M. 

Proof. We use a simple trick from the proof of Proposition 7.2 in [19] , Define M' to be 
a schema mapping whose st-tgds consist of the st-tgds of Mi, whose target egds consist 
of the union of the target egds of Mi, . . . , Mk, and whose target tgds consist of all of the 
st-tgds of M 2 , ■ ■ ■ , Mk, along with all of the target tgds of Mi, ■ ■ ■ , Mk- If I be a source 
instance for Mi, and J is a target instance for Mk, then it is easy to see that J is a solution 
for / with respect to M if and only if J is a solution for / with respect to M' ■ Hence, M 



(2) N^\a) = e N*v>(b). 



□ 



3(h) 
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and M 1 are the same schema mapping semantically, in that they consist of the same set of 
(I, J) pairs. If I and U are as in the statement of the proposition, then it is easy to see 
that U is a result of a chase of I with A4'. So from Theorem 3.3 of [22], we have that U is a 
universal solution of / with respect to M'. Since Ai and A4' are the same schema mapping 
semantically, it follows that U is a universal solution of / with respect to M. . 

We need to say "if it exists" in the statement of Proposition 13.61 since there are two 
reasons that a result of the chase may not exist. First, a target egd may try to equate 
two constants during a chase. Second, target tgds might force an "infinite chase". These 
problems do not arise for the composition of two st-tgd mappings, the case considered in 
Proposition 7.2 of |19j . 

Proof of Theorem \ 3.Si Let A4\ 2 = (Si, S2, £12, £2) and M.23 = (S2, S3, £23) be schema 
mappings, where: 

51 = {E(-,.),P 1 (.),Q 1 (.)}, 

5 2 = {P 2 (-),Q 2 (-),R(;-),S(-,-)}, 

53 = {V(-)}, 

and 

£12 = {Pi(x) ^ P 2 (x), 
Qi{x) -»■ Q 2 {x), 

E{x, y) ->■ 3z 1 3z 2 3z 3 (R(x, Z\) A R(y, z 2 ) A S(z 1 ,Zs) A S(z 2 , z 3 ))}, 

S 2 = {R(x, y) A R(x, z) -> y = z, 

S(x,y) A S(x,z) ->y = z, 

S(y,x) A S(z,x) ->■ y = z}, 

£23 = {P 2 {x) A R(x, z) A R(y, z) A Q 2 (y) — >■ V(x)}. 

First, we show that .M12 M 23 cannot be specified by an SO tgd. For the sake of 
contradiction, assume that CJ13 is an SO tgd from Si to S3 and that schema mapping 
■M-13 = (Si, S3, 013) is the composition of .M12 and 7^23; that is, (Ii,I 3 ) € M12 o AI23 if 
and only if (/i,I 3 ) \= a 13 ■ 

From Theorem 8.2 in [23j , we know that every SO tgd specifies the composition of a finite 
number of st-tgd mappings. Thus, given that M.13 is the composition of .M12 and .M23, we 
have that there exist schema mappings M.\ = (S' l5 S' 2 , £'12)1 • • ■> ^ n -\ = (Sn-i; S„; ^_ ln ) 
such that n > 2, S' x = Si, S^ = S3, £A , x is a set of st-tgds for every i G {1, . . . ,n — 1}, 
and Ai'i o . . . o M.' n _i equals the composition of M.\ 2 and -M23> that is, for every pair 
(Ji,J 3 ) elnst(Si) xInst(S 3 ): 

(Ji,J 3 ) EMi2oM 2 3 (/i,/ 3 )eMi -°^U 

For every i G {1, . . . , n — 1}, let 3u n i v be the canonical solution transformation for AA^, and 
assume that J : Inst(Si) — > Inst(S3) is the transformation defined as 

Wl) = SunivO " " (^univ^inivUl))) • • • )i 

for every instance I\ of Si. From Lemma 13.51 we have that 5 is locally consistent under 
FO-equivalence. Thus, for r = 1 and t = 1, there exist d, k > such that for every instance 
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h of Si and for every a, b € dom(Ii), if N^(a) =k Nj-(b) then (1) a € dom(#(/i)) if and 

only if b € dom(5(/i)), and (2) N^ h \a) = t N? {h \b). 

Define an instance I\ of Si with domain {a, a±,..., a d , b, &i, . . . , b d , c} as follows: P^ 1 = 

{a, b}, Q^ 1 = {c}, and E lx contains the tuples represented by the following figure: 

E E E E 
a — a\ " ■ ■ ■ " ad " c 



E 



E 



E 



Thus, E h = {(a,a 1 ),(a 1 ,a 2 ),. . . ,(a d -i,a d ),(b,b 1 ),(b 1 ,b 2 ),. ■ ■ ,b d -i,b d )}. As shown in the 
figure, E 11 is a union of two paths, one containing d+2 elements with first element a and 
last element c, and another one containing d+1 elements with first element b. Observe that 



k N^j 1 (b) since iVj 1 (a) is isomorphic to iVj 1 (b), with a and b treated as distinguished 



elements. 

Let J3 be the instance of S3 that contains only the tuple a in V (that is, V l3 = {a}). 
Next we show that 73 is a universal solution for I\ with respect to M.12 .M23. A universal 
solution for I\ in M.12 .M23 can be constructed by first chasing with the set £12 of st-tgds: 



ft (a) 


ai 


a 2 . 




ad 


Q 2 (c) 




rJ\r 


H 


V 


rJ\r 




• 


• • 






• • 


• 




/s s\ 


is 




s s\ 


/s 




bd-i b d 

\r R 
s 



(where each element in the figure that is not in dom(Ii) is a fresh null value, which is 
represented by a symbol • in the figure), then chasing with the set £2 of t-egds: 




ft (6) 



bi 



b> 



1 
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and finally chasing with the set £23 of st-tgds. The result of this last step is ^3, since 
after chasing with £12 and £2, we have that P2 contains elements a and b, Q2 contains 
element c, and a, c is the only pair of elements for which there exists an element z such 
that Pzfa) A R(a, z) A R(c, z) A Q2(c) holds. We conclude that I3 is a universal solution for 

h. 

By Proposition 13.61 we know that is a universal solution for I± with respect to 

M'i o ■ ■ ■ o M' n _ v Hence, 3(h) is a universal solution for I\ with respect to At 12 ° A^23- 
Again by Proposition 13.61 we know that I3 is also a universal solution for I± with respect 
to M12 ° M.23- Therefore, since V Is = {a}, we conclude that a G V^ 1 ^ and b V^ 1 ^ . 
Hence, we have that a G dom(5 r (/i)) and 6 dom(5(^i)), which contradicts the fact that $ 
is locally consistent under FO-equivalence and N^(a) N^(b). This concludes the proof 
that At 12 o M23 cannot be specified by an SO tgd. 

To conclude the proof of the theorem, we need to show that At 12 ° M23 cannot be 
specified by an SO tgd together with some arbitrary source and target constraints. For 
the sake of contradiction, assume that schema mapping At 13 = (81,83,71,713,73) equals 
the composition of At 12 and At 23, where 71 is an arbitrary source constraint, 713 is an SO 
tgd and 73 is an arbitrary target constraint. Given that At 12 At 23 cannot be specified 
by an SO tgd, we have that either 71 is not trivial or 73 is not trivial, where an arbitrary 
constraint is said to be trivial if it allows all the possible instances. First assume that 71 is 
not trivial, and let I\ be an instance of Si such that l\ does not satisfy 71 [I\ is not allowed 
by 71). Let 1 be a fresh null value and I2 an instance of S2 defined as: 

r 2 — r \ 1 

Q2 = Qi 1 

R l2 = {( a , _L) I there exists b <E dom(Ii) such that (a, b) G E h or (b, a) G E h }, 
S h = {(L,±)}. 

Furthermore, let ^3 be an instance of S3 defined as V Is = P 2 l2 . It is easy to see that 
(h,h) \= £12, h \= £2 and {h,h) h £23, which implies that (Ii,h) G Mn M-23- Thus, 
given that A4i3 is the composition of M.\2 and .M23) we have that (ii,^) G At 13. We 
conclude that I\ satisfies 71, which contradicts our initial assumption. 

Now suppose that 73 is not trivial, and let ^3 be an instance of S3 such that ^3 does not 
satisfy 73. Assume that I±, I2 are the empty instances of Si and S2, respectively. It is easy to 
see that (h,h) \= £12, h \= £2 and {h,h) h s 23, which implies that I3) € Ali2°At23- 
Thus, given that Ati3 is the composition of At 12 and At23 ; we have that (I\,I^) G At 13, 
and hence ^3 satisfies 73, which contradicts our initial assumption. This concludes the proof 
of the theorem. □ 

4. Source-to-target SO Dependencies 

In Section [3l we showed that SO tgds, even with the assistance of arbitrary source con- 
straints and arbitrary target constraints, cannot always be used to specify the composition 
of mappings with target constraints, even if only key dependencies are allowed as target 
constraints of the mappings being composed. In this paper, we define a richer class, called 
source-to-target SO dependencies (st-SO dependencies). This class of dependencies is the 
source-to-target restriction of the class SkVCQ = of dependencies introduced in [42] • We 
show that st-SO dependencies (together with appropriate target constraints) are the right 
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extension of SO tgds for the purpose of expressing the composition of standard schema 
mappings. The definition of st-SO dependencies is exactly like the definition of SO tgds in 
Definition 12.11 except that condition 3 is changed to: 

(3) Each ipi is a conjunction of 

• relational atomic formulas of the form T(t\, . . . ,tt), where T is an £-aiy relation 
symbol of T and t±, . . . , ti are terms based on x, and /, and 

• equality atoms of the form t = t', where t and t' are terms based on X{ and /. 

It is sometimes convenient to rewrite an st-SO dependency 3/(Vxi(</?i — > ip\) A • •• A 
\/x n {(p n — > ipn)) so that each conclusion ipi is either a conjunction of relational atomic 
formulas or a single equality of terms (this is possible because we can recursively replace 
\/xi{ifi — > (ip\ A ipf)) by Vxi((fi — > ipj) A \/xi{ifi — > ipf) without changing the meaning). Let 
$ be the result of such a rewriting. If ipi is a conjunction of relational atomic formulas, 
then we refer to \/xi((fi — >■ ipi) as an SO tgd part of <!>, and if ipi is an equality t = t' , then 
we refer to Vxj(^j ipi) as an SO egd part of <I>. 

We adopt the same convention for the semantics of st-SO dependencies as was given 
in Section [2] for SO tgds, by assuming the existence of a countably infinite universe that 
includes the active domain. As with SO tgds, it can be shown that the universe can be 
taken to be finite but "sufficiently large" . 

We shall show that the composition of a finite number of standard schema mappings 
can be specified by an st-SO dependency, together with t-egds and a weakly acyclic set of t- 
tgds. It is convenient to give these second-order schema mappings a name. To emphasize the 
similarity of these second-order schema mappings with the first-order case, we shall refer 
to these second-order schema mappings as SO-standard. Thus, an SO-standard schema 
mapping is one that is specified by an st-SO dependency, together with t-egds and a weakly 
acyclic set of t-tgds. 

Note that st-SO dependencies, like SO tgds, are closed under conjunction. That is, the 
conjunction of two st-SO dependencies is equivalent to a single st-SO dependency. This is 
why we define an SO-standard schema mapping to have only one st-SO dependency, not 
several. Note also that every finite set of st-tgds can be expressed with an SO tgd, and so 
with an st-SO dependency. In particular, every standard schema mapping is an SO-standard 
schema mapping. 

5. The Chase for st-SO Dependencies 

In [23], the well-known chase process is extended so that it applies to an SO tgd <&. If 
we define an SO tgd part of an SO tgd as we did for st-SO dependencies, then the idea 
of the chase with SO tgds is that each SO tgd part of <3? is treated like a tgd (of course, 
the conclusion contains Skolem functions rather than existential quantifiers). In deciding 
whether the premise of the SO tgd part is instantiated in the instance being chased, two 
terms are treated as equal precisely if they are syntactically identical. So a premise con- 
taining the equality atom f(x) = g(y) automatically fails to hold over an instance, and a 
premise containing the equality atom f(g(x)) = f(g(y)) automatically fails to hold over an 
instance unless the instantiation of x equals the instantiation of y. 

In this section, we discuss how the chase can be extended to apply to an st-SO depen- 
dency. We note that in |42| . a chase procedure for the dependencies studied there (which 
are like ours but not necessarily source-to-target) was introduced. However, their chase was 
not procedural, in that their chase procedure says to set terms ti and ti to be equal when 
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the dependencies logically imply that t\ = ti- Because of our source-to-target restriction, 
we are able to give an explicit, polynomial-time procedure for equating terms. 

For clarity, we keep the discussion here informal; it is not hard to convert this into a 
formal version. In chasing an instance I with an st-SO dependency <3>, we chase first with 
all of the SO egd parts of and then we chase with all of the SO tgd parts of We no 
longer consider two terms to be equal precisely if they are syntactically identical, since an 
SO egd part may force, say, /(0) and g(l) to be equal, even though /(0) and g(l) are not 
syntactically identical. 

Given a source instance / and an st-SO dependency we now describe how to chase 
/ with the SO egd parts of <£. Let D be the active domain of I (by our assumptions, D 
consists of constants only). Let n be the maximal depth of nesting over all terms that 
appear in Let / consist of the function symbols that appear in Let T be the set of 
terms based on D and / that have depth of nesting at most n. This set T is sometimes 
called the Herbrand universe (with respect to D and /) of depth n. It is straightforward 
to see (by induction on depth) that the size of T is polynomial in the size of D, for a fixed 
choice of We note that if we define T' to be the subset of T that consists of all terms 
t(a), where t(x) is a subterm of <E>, and a is the result of replacing members of x by values 
in D, then we could work just as well with T' as with T in defining the chase. However, 
the proofs are easier to give using T instead of T'. 

We now define a function F with domain the members of T. The values F(t) are stored 
in a table that is updated repeatedly during the chase process. If a is a member of D, then 
the initial value of F(a) is a itself (in fact, the value of F(a) will never change for members 
a of D). If t is a member of T that is not in D (so that t is of the form f(t±, . . . , t k ) for some 
function symbol /), then F(t) is initially taken to be a new null value. As we change F, we 
shall maintain the invariant that if f(t\, . . . ,t k ) and f(t'i, . . . ,t' k ) are members of T where 
F(ti) = F(tJ), for 1 < i < k, then F(f(t lt . . . ,t k )) = F(f(t[, . . .,t' k )). This is certainly true 
initially, since F is initially one-to-one on members of T. 

Let N be the set of all of the new null values (the values initially assigned to F(t) when 
t is not in D). We create an ordering -< on D U N, where the members of D are an initial 
segment of the ordering -<, followed by the members of N. 

We now begin chasing / with the SO egd parts of to change the values of F. Whenever 
t is a member of T such that we replace a current value of F(t) by a new value during the 
chase process, we will always replace the current value of F(t) by a value that is lower in 
the ordering -<. If si(yi) = ^2(^2) is an equality in the premise of an SO egd part of <]?, then 
the equality si(ei) = 52(^2) evaluates to "true" where e% and e~2 consist of members of D, 
precisely if the current value of -F(si(ei)) equals the current value of ^(52(62))- Each time 
an equality t\(a) = £2(6) is forced (because of an SO egd part with conclusion t\{x) = £2(2/)), 
and the current value of F(ti(a)) does not equal the current value of FfoQ))) we proceed 
as follows. Let c\ be the smaller of these two values and let C2 be the larger of these two 
values in our ordering -<. If C2 is a constant, then the chase fails and halts. Otherwise, for 
every member s of T where the current value of F(s) is C2, change the value so that the 
new value of F(s) is c\. Note that under this change, the new value of F(ti(a)) and the 
new value of FfoQ))) are the same (namely, c\). 

These changes in F may propagate new changes in F, which we need to make in order to 
maintain the invariant. Assume that as a result of our changes in F so far, there are terms 
f(h, . . .,t k ) and f(t[, . . .,t' k ) in T where F(U) = F(i<), for 1 < i < k, but F(f(t u . . . ,t k )) 
and F{f{t' 1 , . . . ,t' k )) are different. As before, let c\ be the smaller of these two values and 
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let C2 be the larger of these two values in our ordering -<. If C2 is a constant, then the chase 
fails and halts. Otherwise, for every member s of T where the current value of F(s) is C2, 
change the value so that the new value of F(s) is c\. Note that under this change, the new 
value of F(f(t±, . . . and the new value of F(f(t' 1 , . . . ,t' k )) are the same (namely, c±). 
Continue this process until no more changes occur. It is easy to see that we have maintained 
our invariant. Continue chasing with SO egd parts until no more changes occur. Note that 
at most as many changes can occur as the size of T, since every time a change occurs, there 
are strictly fewer values of F(t) as t ranges over T. This is the key reason why the chase 
runs in polynomial time. 

Once F has stabilized, so that no more changes are caused by chasing with the SO egd 
parts of then chase I with the SO tgd parts of <J>. If si(yi) = $2(2/2) is an equality in 
the premise of an SO tgd part of 3>, then the equality si(ei) = ^2(62) evaluates to "true" 
where e\ and ei consist of members of I, precisely if F{s\(e\f) = ^(^(f^)). These chase 
steps produce the target relation J that is taken to be the result of the chase (and we say 
that the chase succeeds). 

We have the following theorem about the chase process. 

Theorem 5.1. Let be a fixed st-SO dependency. The chase of a ground instance I with 
$ runs in time polynomial in the size of I. The chase fails precisely if there is no solution 
for I with respect to If the chase succeeds, then it produces a universal solution for I 
with respect to $. 

Proof. We first show that the chase of a ground instance / with <3? runs in time polynomial 
in the size of I (when $ is held fixed). It is straightforward to show by induction on depth 
that the size of T is polynomial in the size of D. As we noted, during the chase with the 
SO egd parts, there are at most as many changes in the current value of F as the size of 
T. So only polynomially many changes occur in the values of F. For each such change, 
there is only a polynomial amount of work: the time needed to chase each SO egd part 
and update F if needed along with the time to check the invariant and update F if needed. 
Finally, since the SO tgd parts are source-to-target, it follows easily that the final portion 
of the chase, that is, chasing with the SO tgd parts, can also be done in polynomial time. 
Therefore, the entire chase can be carried out in polynomial time. 

Assume for now that there is J such that (J, J) |= <J>. If the existentially quantified 
function symbols of <E> are given by /, then let f° denote the instantiation of / that shows 
that (I, J) \= <!>. For each member t of T, let t be the value obtained by replacing the 
function symbols / by f°. By induction on the steps in the chase process, we can show 
that at all points during the chase process, if the current value of F(t\) equals the current 
value of Ffo), then necessarily t® = t®. Thus, whenever we set two values F(t\) and Ffo) 
equal during the process of chasing with the SO egd parts, we are forced to do so. This is 
clear when two values are made equal because of the conclusion of an SO egd part. It is 
also true when two values are made equal because of maintaining the invariant, because the 
invariant is needed for the functions in f° to be well-defined. For example, assume that f° 
and g° are unary functions in / . If f°(a) = b, then necessarily g (f (a)) = g (b), and this 
is reflected by the requirement of the invariant that F(g(f(a)) = F(g(b)). Since the only 
time we make two values equal in the table for F is when we are forced to, it follows that 
if the chase process fails for / (because we try to set two constants to be equal), then there 
is no solution for / with respect to <!>. We now consider the other case, where the chase 
succeeds for /. 
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Assume that the chase succeeds for /. Let n be the maximal depth of nesting of 
function symbols in <!>. Use (the final values of) the table for F to define our functions f° 
on the Herbrand universe of depth n — 1. For example, if a is in the active domain D, and 
F(f(a)) = c, then let /°(a) = a Similarly, if a and b are in D, and F(h(g(a), b)) = d, then 
let hP (g° (a) ,b) = d. The invariant insures that the functions in /° are well-defined on the 
Herbrand universe of depth n — 1 (and the table then gives us the values of all members of 
the Herbrand universe of depth n). Our semantics requires the functions in /° to be defined 
not just on the Herbrand universe of depth n — 1, but on the entire universe. If / is a fc-ary 
function symbol in /, and if c±,... , c& are values such that /°(ci, . . . , c^) is not already 
determined by the rules we have given, then let /°(ci, . . . , C&) be arbitrary. The key point 
is that $ refers only to terms in the Herbrand universe of depth n, so what happens outside 
of the Herbrand universe of depth n is irrelevant, as far as satisfaction of <3> is concerned. 
The chase with the SO egd parts force equalities among the values of the functions so that 
/ (together with the choice of the functions) satisfies the SO egd parts of <£. If J is the 
result of the chase, then the chase with the SO tgd parts force (I, J) to satisfy the SO tgd 
parts of <£. Hence, J is a solution for / with respect to as desired. 

It is also clear that if J' is an arbitrary solution for / with respect to <3>, then up to a 
replacement (not necessarily one-to-one) of nulls in J by other values (nulls or constants), 
every tuple of every relation that appears in J must appear in the corresponding relation 
of J' (since tuples are produced in the chase only if needed, and equalities are forced in the 
chase only if needed). But this means that there is a homomorphism from J into J', so J 
is a universal solution, as desired. □ 

Because there is a polynomial-time chase for st-SO dependencies, there is also a polyno- 
mial-time chase for SO-standard schema mappings: first, chase with the st-SO dependency, 
and then with the target dependencies. The reason that chasing with the target dependen- 
cies requires only polynomial time is that the number of steps in this chase is polynomial, 
because of the weak acyclicity assumption (Theorem 3.9 of [22]). We therefore can extend 
Theorem 15.11 to apply to SO-standard schema mappings. We state this in the following 
corollary. 

Corollary 5.1. Let Ai be an SO-standard schema mapping. The chase of a ground instance 
/ with Ai runs in time polynomial in the size of /. The chase fails precisely if there is no 
solution for I with respect to Ai. If the chase succeeds, then it produces a universal solution 
for I with respect to Ai. 

Note that in particular, Corollary 15.11 tells us that there is a polynomial-time algorithm for 
determining, given a source instance I, whether there is a solution for /, and if so, producing 
a universal solution for I. 

As shown in [22], we can use a universal solution to obtain the certain answers to 
unions of conjunctive queries in polynomial time. We now recall the definition of the 
certain answers. Let Ai = (S,T, £) be a schema mapping, and let q be a /c-ary query 
posed against the target schema T. Denote by q( J) the result of evaluating q on a target 
instance J. If I is a source instance, then the certain answers of q on I with respect to 
Ai, denoted by certain^i*!, I), are the fe-tuples t such that, for every solution J of I with 
respect to Ai, we have that t £ q(J). It should be noticed that if a source instance / 
does not have any solution with respect to the mapping Ai, then certainM(q, I) = D fc 
(recall that D is the countably infinite domain from which the entries of tuples are taken), 
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as every /c-tuple trivially satisfies the previous condition. In this case, we use the special 
symbol T to indicate that every fc-tuple belongs to certain^ (q, I), that is, we say that 
certain j^\{<i,l) = T. If U is a universal solution for / with respect to Ad, and q is a union 
of conjunctive queries, then it is shown in [22] that certain m{<1-,I) equals q(U)±, which is 
the result of evaluating q on U and then keeping only those tuples formed entirely of values 
from I (that is, tuples that do not contain nulls). The equality certain m(q> -0 = Q{U)i 
holds for arbitrarily specified schema mappings Ad (as long as such a universal solution U 
exists). Corollary 15.11 therefore has the following corollary, which is analogous to the same 
corollary in [23] for mappings specified by SO tgds. 

Corollary 5.2. Let Ad be an SO-standard schema mapping. Let q be a union of conjunc- 
tive queries over the target schema T. Then for every ground instance / over S, the set 
certain j^\{ci, I) can be computed in polynomial time (in the size of I). 

Proof. Assume that the arity of query q is k, where k > 0. Then the polynomial-time 
algorithm to compute certain m{Qi -0 works as follows. It first checks (using the polynomial- 
time algorithm of Corollary 15. ip whether / has a solution with respect to Ad. If not, then 
certain_\4(q, I) = D fc , and the algorithm returns symbol T to indicate that every tuple with 
k elements belongs to certain/^ (q, I). Otherwise / has at least one solution with respect to 
Ad , and the algorithm computes a universal solution U for / as in Corollary 15. 1\ and then 
it returns q(U)^ (recall that, as discussed above, certainj^[{q,l) = q(U)i). □ 

6. A Positive Result: SO-Standard Schema Mappings are the Needed Class 

In this section, we show that SO-standard schema mappings (those specified by an st-SO 
dependency, along with target constraints consisting of t-egds and a weakly acyclic set of 
t-tgds) exactly correspond to the composition of standard schema mappings. 

6.1. Using SO-standard schema mappings to define compositions. Before we show 
that the composition of an arbitrary number of standard schema mappings is equivalent to 
an SO-standard schema mapping, we first show that target constraints are needed (that is, 
st-SO dependencies by themselves are not enough). In fact, the next proposition says that 
st-SO dependencies, without target constraints, are not capable of specifying even schema 
mappings specified by st-tgds and a set of key dependencies. 

Proposition 6.1. There exists a schema mapping Ad\2 = (Si, S2, S12, £2); where £12 is a 
set of st-tgds and £2 is a set of key dependencies, such that Ad±2 cannot be specified by an 
st-SO dependency. 

As we shall see, we get an easy proof of Proposition 16.11 by using the following simple 
proposition, which is analogous to the same result for st-tgds [19] . 

Proposition 6.2. Let oyi be an st-SO dependency, let I be a source instance, and let J be 
a target instance. If (I, J) \= o\i and J C J', then (I, J') \= oyi- 

Proof of Proposition^ Let Si = {S(v)}> S 2 = {T(-,-)} and E i2 = {S(x,y) -> T(x,y)}, 
and assume that £2 consists of the single key dependency T(x,y) A T(x,z) y = z. By 
way of contradiction, assume that .M12 can be specified by an st-SO dependency 012 • Let 
/ = {5(1,2)}, J = {T(l,2)} and J' = {T(l, 2), T(l, 3)}. Given that (I, J) \= S M U S 2 , 
and a 12 specifies AA12, we have that (I, J) \= o\i- So by Proposition 16.21 we have that 
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(I, J') \= a i2- Since an specifies A4n, we therefore have that (I, J') \= Yin U £2- But this 



Let A4n and M23 be standard schema mappings. The previous negative result implies that 
st-SO dependencies by themselves cannot necessarily specify the composition A4n -^23- 
Our next theorem, which we shall prove shortly, implies that Mn -M-23 is equivalent to an 
SO-standard schema mapping In fact, it says that we can take the target constraints 

of M13 to be the set £3 of target constraints of .M23. Intuitively, this theorem tells us that 
st-SO dependencies are expressive enough to capture the intermediate target constraints in 
a composition. 

Theorem 6.3. Let Ain = (Si, S2, £12, £2) an d M.23 = (S2, S3, £23, £3) be standard 
schema mappings (so that £12, £23 <ire sets of st-tgds, and £j (i = 2,3) is the union of a 
set of t-egds and a weakly acyclic set of t-tgds). Then there exists an st-SO dependency a\3 
such that the mapping Ai±3 = (Si, S3, 013, £3) is equivalent to the composition Ain ^23- 

In Section 16.21 we show that the composition of SO-standard schema mappings is also an 
SO-standard schema mapping. By combining this result with Theorem 16.31 (and using the 
simple fact, noted earlier, that every standard schema mapping is an SO-standard schema 
mapping), we obtain our desired result, namely, that the composition of a finite number of 
standard schema mappings is equivalent to an SO-standard schema mapping. 

It is straightforward to show that Theorem 16.31 is a consequence of the following propo- 
sition. 

Proposition 6.4. Let Ain be a standard schema mapping, and let M23 be an st-tgd map- 
ping (no target constraints). Then the composition .M12 .M23 can be specified by an st-SO 
dependency. 

As pointed out in Section HI the class of st-SO dependencies corresponds to the source- 
to-target restriction of the class of SkVCQ = dependencies introduced in [J2]. In fact, The- 
orem 16.31 and Proposition 16.41 were essentially established in [J2] (see Theorems 6 and 9 
and the paragraph after Theorem 10 in |42j). but they are restated and clarified here for 
the sake of completeness. We also show here how Proposition 16.41 is proved, which is a 
straightforward adaptation of the proofs of Theorems 6 and 9 in |42] , and the comments in 
the paragraph after Theorem 10 to handle a weakly acyclic set of target tgds. 

We now demonstrate, by example, how an st-SO dependency 013 is obtained from .M12 
and M23 in Proposition 16.41 (it will be clear how to extend from the example to the general 
case). Assume that Si = {A(-, -),B(-)}, S 2 = {C(-, •), D(-, •)}, S 3 = {£(■, •)}• Furthermore, 
suppose that £12 consists of the following st-tgds: 



is a contradiction, since J' ^= £2. 



□ 



A(x,y) ->■ 
B(x) -> 



C{x,y), 
3yC{x,y), 



(6.1) 



£2 consists of the following t-tgds: 



C(x,y) AC(y,z) 



->■ C(z,x), 
— ^ 3zD(x,z) 



C(x,y) 
C(x, x) 
D(x,y) 



— > D(x,x), 
-> D(y,x), 



(6.2) 
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and £23 consists of the st-tgd: 

D{x,y) -> 3zE(x,y,z). (6.3) 

To obtain o"i3, we first Skolemize each dependency in £12, £2 and £23 to obtain the sets 
£(£12), £(£2) and £(£23) of dependencies, respectively. So we replace (|6.ip . (|6.2p and ()6.3[) 
by: 

-> C(x,/(x)), 
C(x,y) -> D(x,g(x,y)), 
D(x,y) -> E(x,y,h(x,y)), 

respectively. Then for predicates C and D, we introduce functions fc, gc, Id and go, 
where fc, gc have the same arity as C, and where fo, gD have the same arity as D, and 
we define 013 as: 

where /, 5 and h are the Skolem functions introduced above and ^ is a conjunction of a 
set of dependencies defined as follows. As predicate C cannot be mentioned in functions 
fc and gc are used to replace it: the equality fcifl) = gcifl) is used to indicate that C{a) 
holds. Thus, the first two conjuncts of Vl/ are generated from £(£12) by replacing C(x) by 
fc(x) = gc(x): 

A{x,y) -> f c (x,y) = g c (x,y), 

B(x) -> f c (x,f(x)) =g c (x,f(x)). (6.4) 

Similarly, functions /d and 3d are used to replace predicate -D, and the dependencies in 
£(£2) are used to generate the following conjuncts of 

dom(.T) Adom(y) A doni(z) Afc(x,y) = gc(x,y) Afc(y,z) = g c (y,z) -> fc(z,x) = gc(z,x), (6.5) 
dom(x) Adom(y) A f c (x,y) = g c (x,y) — » f D (x,g(x,y)) = g D (x, g(x, y)), (6.6) 
dom(x) A fc(x,x) = gc(x,x) ->■ = g D {x,x), (6.7) 

dom(a;) Adom(y) A fo(x,y) = go(x,y) -> fo(y,x) = g D {y,x), (6.8) 

where dom(-) is a formula that defines the domain of the instances of Si, that is, dom(x) is 
3y^4(x,y) V 3z.A(z,2;) V B(x). This predicate is included in the previous dependencies to 
satisfy the safety condition of st-SO dependencies, namely, that every variable mentioned 
in a term has to be mentioned in a source predicate. We then use the standard approach 
for eliminating disjunctions in a premise (for example, (pi V (f2 — > ip can be replaced by the 
two formulas ipi — > ip and ip2 — > ip). 

Notice that if an equality fc( a , f( a )) = 9c( a , f( a )) can be inferred by using de- 
pendency (|6.4p . then we know that C(a,/(a)) holds. Thus, since D(a, g(a, /(a))) can 
be obtained from C(a,f(a)) and the dependency C(x,y) — > D(x, g(x,y)), it should be 
possible to infer that /o(a, g(a, f(a))) = gz>(a, g(a, f(a))) holds by using the fact that 
fc{a, fip)) — 3c (o> f( a )) holds and the dependencies in ^. However, if dom(/(a)) does not 
hold, then fc(a, /(«)) = 3c( a > fifl)) does not satisfy the premise of dependency (|6.6|) and, 
therefore, /r>(a, 3(0, /(a))) = 3d(o> 3(0, /(o))) cannot be inferred by using this dependency. 
To overcome this limitation, we also instantiate the above four dependencies with the terms 
that appear in the tuples that are generated by repeatedly applying the formulas in £(£2). 
More precisely, it is possible to infer that only terms of the form x and f{y) need to be 
considered for the case of predicate C and, thus, dependencies (16. 5ft . (|6.6j) and (16. 7j) are 
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instantiated with all the possible combinations of these types of terms. For example, the 
following is one of the conjuncts of \l/ generated from formula (|6.5j) : 

dom(x) A dom(y) A dom(z) A 

fc(f(x),y) = gc(f(x),y) A f c (y,f(z)) =gc(y,f(z)) ->■ 

fc(f(z)J(x))=g c (f(z),f(x)), 
while the following dependency is one of the conjuncts of generated from formula f|6. 7[) : 

dom(x) A dom(y) A f c (f(x), f(y)) = gc(f(x),f(y)) A 

f(x) = f(y) -> f D (f(x)J(y)) = g D (f(x)J(y)). (6.9) 

Notice that in the previous dependency we have included the equality f(x) = f(y), as it 
can be the case that f(a) = f(b) holds for distinct elements a and b. Similarly, it is possible 
to infer that only terms of the form x, f(y), g(x,y), g(x,f(y)), g{f(x),y) and g(f(x),f(y)) 
need to be considered for the case of predicate D. Thus, dependency (I6.8P is instantiated 
with all the possible combinations of these types of terms. For example, the following is 
one of the conjuncts of ^ generated by this process: 

dom(x) Adom(y) A dom(z) A f D (f(x), g(f(y), z)) = g D {f{x), g{f{y), z)) -»• 

f D (g(f(y),z), f{x)) = g D (g(f(y),z), f(x)). 

Finally, the last conjuncts of * are generated from dependency D(x,y) — > E(x,y, h(x,y)) 
as above. For example, the following are two of these conjuncts: 

dom(3;) A dom(y) A f D (x, y) = g D (x, y) -> E(x, y, h(x, y)), 

dom(x) A dom(y) A dom(z) A f D (f(x),g(f(y),f(z))) = g D (f(x),g(f(y),f(z))) -»• 

E(f(x),g(f(y)J(z)),h(f(x),g(f(y),f(z)))). 

It is important to notice that the weak acyclicity of S2 guarantees that the above process 
terminates. That is, we need only consider terms up to a certain fixed depth of nesting. In 
particular, in the above example, we need to consider only terms where the nesting depth 
of functions is at most 2. 

Example 6.1. We conclude this section by showing why weak acyclicity is necessary to 
guarantee the termination of the above process. Assume that A4\2 = (Si, S2, £12, E2) an d 
M23 = (S 2 ,S 3 ,S 2 3), where Si = {A(-,-)}, S 2 = {£(-,■)}, S 3 = {€{;■)}, £12 consists of 
the following st-tgd: 

A(x,y) -> B(x,y), 

S2 consists of the following t-tgd: 

B(x,y) -> 3zB(y,z), (6.10) 

and E23 consists of the st-tgd: 

B(x,y) -> C(x,y). 

Notice that M12 is not a standard schema mapping, as S2 is not weakly acyclic. 

In order to obtain an st-SO dependency 013 that specifies the composition of A4i2 and 
M.23, the above process first Skolemizes each dependency in £12, £2 and £23 to obtain the 
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sets £(£12), £(£2) and £"(£23) of dependencies, respectively. In particular, the t-tgd (16.10P 
is replaced by the dependency: 

B(x,y) ->• B(y,h(x,y)). (6.11) 

Then binary functions f B and are introduced, and a 13 is defined as 3h3f B 3gB where 
\P is a conjunction of a set of dependencies defined as follows. The first conjunct of \l/ is 
generated from £(£12) by replacing B(x,y) by f B (x,y) = g B (x,y): 

A{x,y) -> f B (x,y) = g B (x,y). (6.12) 

Then functions /s and <7_g are used to eliminate predicate -B from £(£2). In particular, the 
following conjunct is included in \t: 

dom(x) Adom(y) Af B (x,y) = g B (x,y) -)• f B (y,h(x,y)) = g B (y,h(x,y)), (6.13) 

where dom(-) is a formula that defines the domain of the instances of Si, that is, dom(x) is 
3uA(x,u) V 3t> x). As mentioned above, predicate dom(-) is included in the previous 
dependency to satisfy the safety condition of st-SO dependencies. 

It should be noticed if (a, b) is a tuple in A, one can infer that f B (a, b) = g B (a, b) holds 
by considering dependency (|6.12p . and then one can infer that f B (b, h(a, b)) = g B (b, h(a, b)) 
holds by considering dependency (|6.13p . By definition of (J13, this implies that B(b, h{a, b)) 
holds, from which one concludes that B(h(a,b),h(b,h(a,b))) also holds (from dependency 
(|6.1ip ). Thus, in this case it should be possible to infer that 

f B (h(a, b), h(b, h(a, b))) = g B {h(a, b), h(b, h(a, b))) (6.14) 

holds from the dependencies in \&. However, if dom(h(a, b)) does not hold, then one cannot 
infer equality (|6.14p from dependency (I6.13|) and the fact that f B {b, h(a, b)) = 5_b(6, h(a, b)) 
holds. This forces one to instantiate dependency (I6.13P with the terms that appear in 
the tuples that are generated by repeatedly applying (|6.1ip . In particular, the follow- 
ing dependency is included as a conjunct of VP to be able to infer (I6.14p from equality 
f B {b, h(a, b)) = g B {b, h(a, b)): 

dom(x) A dom(y) A f B (x, h(x, y)) = g B (x, h(x, y)) ->• 

f B (h(x, y), h(x, h(x, y))) = g B {h(x, y),h(x, h(x, y))). 

The previous dependencies are used to deal with the terms where the nesting depth of 
functions is at most 2. But given that £2 is not weakly acyclic, one also needs to deal 
with the terms where the nesting depth of functions is 3, which forces one to include the 
following dependency conjunct of \t: 

dom(x) Adom(y) A f B (h(x,y),h(x,h(x,y))) = g B (h(x,y),h(x,h(x,y))) -> 

f B (h(x, h(x, y)), h(h(x, y),h(x, h(x, y)))) = g B (h{x, h(x, y)), h(h(x, y),h(x, h(x, y)))). 

It is not difficult to see that the process does not terminate in this from the preceding 

dependency one needs to generate a formula to deal with the terms where the nesting depth 
of functions is 4, which in turn has to be used to generate a dependency to deal with nesting 
depth 5, and so on. □ 
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6.2. Composability of SO-standard schema mappings. The next theorem implies 
that the composition of SO-standard schema mappings is an SO-standard schema mapping. 
This is the final step we need to show that the composition of a finite number of standard 
schema mappings is given by an SO-standard schema mapping. 

Theorem 6.5. For every pair Myi = (Si, S2, cr 12 , £2) and -^23 = (S2, S3, 023, £3) of 
schema mappings, where a\%, o<iz are st-SO dependencies and Sj (i = 2, 3) is the union of a 
set of t-egds and a weakly acyclic set oft-tgds, there exists an st-SO dependency a\s such that 
the schema mapping M13 = (Si, S3, 013, S3) is equivalent to the composition Myi M-22,- 

Note that, just as in Theorem l6.3l the set £3 used in .M23 is also used in .M13. Theorem 
16.51 was essentially established in [32] (see Theorems 6 and 9 and the paragraph after The- 
orem 10 in [32]), since the class of st-SO dependencies corresponds to the source-to-target 
restriction of the class of SkVCQ = dependencies introduced in [32]. 

As pointed out in Section [6.11 the previous result is fundamental to showing that SO- 
standard schema mappings can define the composition of standard schema mappings, since 
from the combination of this result with Theorem 16.31 (and using the simple fact that every 
standard schema mapping is an SO-standard schema mapping), we obtain the following 
theorem as a consequence. 

Theorem 6.6. The composition of a finite number of standard schema mappings is equiv- 
alent to an SO-standard schema mapping. 

6.3. SO-standard schema mappings are exactly the needed class. We have intro- 
duced st-SO dependencies (and SO-standard schema mappings) because of Theorem 16.61 
In this section, we show that SO-standard schema mappings are exactly the needed class, 
since the converse of Theorem 16.61 also holds. Specifically, we have the following theorem. 

Theorem 6.7. Every SO-standard schema mapping is equivalent to the composition of a 
finite number of standard schema mappings. 

This is proven by showing the following: 

Theorem 6.8. Every schema mapping M = (S,T, cr s t), where a s t is an st-SO dependency, 
is equivalent to the composition of a finite number of schema mappings, each specified by 
st-tgds and t-egds. 

Note that, somewhat surprisingly, we do not need to make use of a weakly acyclic set 
of t-tgds (or any t-tgds at all) in Theorem 16.81 In particular, let .M12 and .M23 be as in 
Proposition 16.41 (where the specification of M12 may make use of a weakly acyclic set of t- 
tgds). By Proposition 16.41 the composition is given by a schema mapping .M13 specified by 
an st-SO dependency; furthermore, by Theorem 16.81 we know that AI13 is the composition 
of a finite number of schema mappings, each specified by st-tgds and t-egds (no t-tgds). So 
A^i2 A^23 needs no t-tgds to specify it, even though .M12 makes use of t-tgds. 

We now show how Theorem 16.71 follows from Theorem 16.81 Let M = (S,T, a s t, £4) be 
an SO-standard schema mapping (where a s t is an st-SO dependency, and £t is the union 
of a set of t-egds and a weakly acyclic set of t-tgds). Let M' = (S,T,a s t), where we 
discard £^ from M. By Theorem 16.81 where the role of M is played by M ', we know 
that there are schema mappings Mi, . . . , Mk, each specified by st-tgds and t-egds, such 
that M' = Mi o • • • o Mk- Assume that Mk = (S', T, cr st , T&), with Tfc consisting only 
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of t-egds. Let M' k = (S', T, a s t,T k U £j). Then Mi, . . . , Mk-i-, M' k are standard schema 
mappings (M' k is a standard schema mapping, since its only t-tgds are those in £ t ). Since 
(S, T, a st ) = M\ o ■■■ o Mt, it follows easily that (S, T, a st , £ t ) = .Mi o • • • o Mfe-i ° -Mfc- 
Thus, M = Mi o ••• o o M' fe . 

We now demonstrate, by example, how Theorem 16.81 is proved (again, it will be clear 
how to extend from the example to the general case) . Our proof is an extension of the proof 
of Theorem 8.2 in [23], that every SO tgd specifies the composition of a finite number of 
st-tgd mappings. 

Assume that S = T = {T(-, •)}, £ t = and a st is the following st-SO depen- 

dency: 

3f3g [Vx (S(x) T(f(g(x)),g(f(x)))) A VxVy (S(x) A S(y) A f(x) = f(y) g(x) = g{y))}. 

Next we construct schema mappings Mi 2 = (Si, S2, £12, £2)) .M23 = (S2, S3, S23, S3) and 
Mzi = (83,84,2^34) such that (1) Si = S, (2) S4 = T, (3) £12, £23 and £34 are sets of 
st-tgds, (4) £2 and £3 are set of t-egds, and (5) the mapping specified by a s t is equivalent 
to A4i2 o A4 2 3 o A4 34 . 

Define S2 as {R\(-), Fi(-, •), G±(-, •)} and £12 to consist of the following st-tgds: 

S{x) -> Rt(x), 

S(x) -»> 3yFi(x, y), 

S(x) -»> 3yGi(x,y). 

Intuitively, we take R± to copy 5, we take F±(x,y) to encode f(x) = y, and we take 
Gi(x,y) to encode g(x) = y. In particular, the second and third dependencies have the 
effect of guaranteeing that f{x) and g(x) are defined for every element x in S, respectively. 

Given that £12 cannot guarantee that F\ and G\ each define a single image for every 
element in S, we let £2 consist of the following t-egds: 

Fi(x,y) A Fi(x,z) -4 y = z, 

Gi(x, y) A G\(x, z) -> y = z, 

which guarantee that F\ and G\ encode functions. In the same way, define S3 as {^(O) 
i^O) ■)) Cr2(-, •)} an d £23 t° consist of the following st-tgds: 

Ri(x) -> i? 2 (a;), 

Fi(x,y) -> F 2 (x,y), 

Gi{x,y) G 2 (x,y), 

F\{x,y) 3zG 2 (y,z), 

G x {x,y) -> 3zF 2 (y,z). 

Intuitively, we take i? 2 to copy F2 to copy F\, and G2 to copy G\, and we include the 
fourth dependency to guarantee that g(y) is defined for all y in the range of /, and we 
include the fifth dependency to guarantee that f{y) is defined for all y in the range of g. 
Also as in the previous case, we include in £3 two t-egds that guarantee that F 2 and G 2 
are indeed functions: 

F 2 {x,y) A F 2 {x,z) -> y = z, 
G 2 (x, y) A G 2 (x, z) -)• y = z. 
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Given that at this point, we have predicates that encode the values of all the terms that 
are used in a s t, we also include in S3 dependencies that encode the conjuncts of o~ s t of the 
form Vx (ip — >• t\ = t 2 ). Thus, in this case we include in S3 the following t-egd that encodes 
the conjunct MxMy (S(x) A S(y) A f(x) = f(y) -> g(x) = g(y))\ 

R 2 {x) A R 2 {y) A F 2 (x,z) A F 2 (y,z) A G 2 (x,u) A G 2 (y,v) ->u = v. 

Finally, we use R 2 , F 2 and G 2 to encode the remaining conjuncts of a s t, which indicate 
how to populate the target relations of a s t- Thus, we define E34 to consist of the following 
st-tgd: 

R 2 {x) AG 2 (x,yi) AF 2 (y 1 ,y 2 ) AF 2 (x,zi) AG 2 (zi,z 2 ) T(y 2 ,z 2 ). 

This concludes the demonstration by example of how to prove Theorem 16.81 This demon- 
stration gives, as a special case (when the st-SO dependency is unnested) the following 
lemma (where we note also the number of schema mappings that are composed). 

Lemma 6.9. Every schema mapping Ai = (S,T,a s t), where o~ s t is an unnested st-SO 
dependency, is equivalent to the composition of two schema mappings, each specified by 
st-tgds and t-egds. 

We note that Theorem 16.81 follows immediately from Lemma 16.91 and the fact, as we show 
later, that every st-SO dependency is equivalent to an unnested st-SO dependency, There- 
fore, we really needed to prove only Lemma [6.9l (the unnested case) rather than the general 
case that we dealt with in proving Theorem 16.81 

7. Collapsing Results: Nesting is Not Necessary 

Recall that we say that an st-SO dependency or SO tgd is unnested if its depth of nesting 
is at most 1. Thus, an unnested st-SO dependency or SO tgd can contain terms like f(x), 
but not terms like f{g{x)). In this section, we present collapsing results about the depth of 
nesting of function symbols in st-SO dependencies and SO tgds. Specifically, we prove the 
following two theorems. 

Theorem 7.1. Every st-SO dependency is equivalent to an unnested st-SO dependency. 

Theorem 7.2. Every SO tgd is equivalent to an unnested SO tgd 

These two results, especially the second one, are the most technically difficult results 
in the paper. Both results are surprising, since the "obvious" way to try to denest, which 
we now describe, does not work. Consider for example the SO tgd 

3f3gVxVy(P(x, y) A (f(g(x)) = y) -> Q(x, y)) (7.1) 

The "obvious" way to denest (|7.ip is to introduce a new variable z and rewrite (|7.ip as 

3f3gVxVyVz(P(x, y) A (g(x) = z) A (f(z) = y) Q(x, y)) (7.2) 

However, the formula (|7.2|) is not an SO tgd, since it violates the safety condition (because 
the variable z does not appear in P(x, y), the only relational atomic formula in the premise 
of CL2D). 

It should be mentioned that in [38], Libkin and Sirangelo introduce the second-order 
language of Skolemized STDs (SkSTDs), and study some of its fundamental properties. In 
particular, it is shown in [38] that this language is closed under composition if the premises 
of SkSTDs are restricted to be conjunctive queries. Interestingly, this fragment of SkSTDs 
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is similar to the language of SO tgds but does not allow nesting of functions, which may lead 
one to think that Theorem 17.21 can be deduced from the results in [38]. However, no safety 
condition is imposed on the premises of SkSTDs in [38] and, thus, nesting of functions is 
not needed in this language as it can be eliminated in the "obvious" way shown above. In 
fact, dependency (|7.2|) is a valid constraint according to [38] . 

Before giving the proofs of Theorems l7.ll and l7.2l we present and discuss some corollaries 
of these theorems. 

Corollary 7.1. The composition of a finite number of st-tgd mappings can be specified by 
an unnested SO tgd. 

This is a strengthening of the result (Theorem 8.1 in [23]) that the composition of a 
finite number of st-tgd mappings can be specified by an SO tgd (thus, Corollary 17.11 says 
that we can replace "SO tgd" in Theorem 8.1 in [23] by "unnested SO tgd"). Corollary 17. II 
follows immediately from the result we just cited (Theorem 8.1 in [23]) and our Theorem 17.21 
It was not even known before that the composition of two st-tgd mappings can be specified 
by an unnested SO tgd. Thus, although it was shown in [23] that each unnested SO tgd 
specifies the composition of some pair of st-tgd mappings, the converse was not shown. 
In fact, for the composition of two st-tgd mappings, the composition construction in |23] 
produces an SO tgd whose depth of nesting can be 2, not 1. 

We feel that nested dependencies are difficult to understand (just think about an equal- 
ity like f(g(x),h(f(x,y))) = g(f(x,h(y)))), and probably also difficult to use in practice. 
On the other hand, unnested dependencies seem to be more natural and readable. For 
example, it is easy to see that the "nested mappings" in [25] can be expressed by unnested 
SO tgds. Corollary 17.11 tells us that unnested SO tgds are also expressive enough to specify 
the composition of an arbitrary number of st-tgd mappings. 

Theorem 17.21 has as another corollary the following collapsing result about the number 
of compositions of st-tgd mappings. 

Corollary 7.2. The composition of a finite number of st-tgd mappings is equivalent to the 
composition of two st-tgd mappings. 

This follows from Corollary 17. II and the fact (which is a special case of Theorem 8.4 of 
[23]) that a schema mapping specified by an unnested SO tgd is equivalent to the compo- 
sition of two st-tgd mappings. 

The next two corollaries follow from Theorem 17.11 just as Corollaries 17.11 and 17.21 follow 
from Theorem 17.21 

Corollary 7.3. The composition of a finite number of standard schema mappings can be 
specified by an unnested st-SO dependency, along with t-egds and a weakly acyclic set of 
t-tgds. 

Corollary 7.4. The composition of a finite number of standard schema mappings is equiv- 
alent to the composition of two standard schema mappings. 

In fact, it follows from Corollary 17.31 and Lemma 16.91 that we can slightly strengthen 
Corollary 17.41 as follows. 

Corollary 7.5. The composition of a finite number of standard schema mappings is equiv- 
alent to the composition M\ o JH 2 of two standard schema mappings M.\ and M.2-, where 
the target constraints of Mi are only t-egds (no t-tgds). 
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Corollary 17.41 has a direct, almost trivial proof that does not use our heavy machinery, 
as we now show. Let M12, M23, ■ ■ -M-k-l k be standard schema mappings. Define M' 12 
to have source schema the same as M12, target schema equal to the union of the target 
schemas of M.12, . . ., M.k-2k-ii and constraints equal to the union of the constraints of 
M12, ■ • ■, Mk-2k-l- Because all of the schemas are disjoint, it is easy to see that -M' 12 is 
a standard schema mapping (note that the st-tgds of M23, ■ ■ ., M-k-2 k-l are now being 
treated as t-tgds of -M' 12 ). Then it is clear that 

M12 o M23 0...0 M k -i k = M12 o M k -i jfc. 

In contrast to Corollary 17.41 the reason that Corollary 17.21 is quite unexpected is that there 
is no obvious way to deal with all of the st-tgds in the intermediate schema mappings. 

Corollary 17.31 unlike Corollary I7.4( does not seem to have a simple direct proof that 
avoids the machinery of Theorem 17.11 This is because our construction of the composition 
of two standard schema mappings produces an st-SO dependency whose nesting depth can 
be arbitrarily large. 

Based on our collapsing results, there are two alternative ways to deal with the compo- 
sition of multiple st-tgd mappings. First, by Corollary 17. 1( we can replace this composition 
by a single schema mapping, specified by an unnested SO tgd. Second, by Corollary 17.21 
we can replace the composition by the composition of only two st-tgd mappings. Similarly, 
by using Corollaries 17.31 and 17.41 we have two alternative ways to deal with the composition 
of a large number of standard schema mappings. 

We now provide the proofs of Theorems 17.11 and 17. 2[ 

Proof of Theorem \ 7.1\ In this proof, we use the following terminology. Given a term t, 
recursively define the set of non-atomic sub-terms of t, denoted by non-atomic (t), and the 
list of variables of t, denoted by list-var(t), as follows: (1) if t = x, where x is a variable, 
then non-atomic(i) = and list-var(t) = [x]; (2) if t = f(ti,t2, ■ ■ ■ ,t n ), where t\, i 2 , . . ., t n 
are terms, then 

n 

non-atomic(i) = {f(ti, ■ ■ ■ , t n )} U non-atomic(ii) 

i=i 

and list-var(t) = list- var[ti] -list- var[t2] - - • .-list-var[t n ], where Li-L 2 is the result of appending 
L 2 to L\. For example, if t = f(x,h(z,y,g(x))), then non-atomic(f) = {f(x,h(z,y,g(x))), 
h(z,y,g(x)),g(x)} and list-var(f(x, h(z,y, g(x)))) = [x,z,y,x]. Moreover, consider a term 
replacement skel(-) that describes the skeleton of a term. For example, if t = f(x, g(y)), then 
skel(t) is /(_,<?(_)), as this shows what are the functions that have been included in t and 
how they have been nested in this termJ3 More precisely, recursively define skel(-) as follows: 
(1) skel(x) = _ for every variable x, and (2) skel(/(ti, . . . ,t n )) = /(skel(ti), . . . ,skel(t n )), 
for every ra-ary symbol / and terms t±, . . ., t n . For example, skel(f(x,h(z,y,g(x)))) = 
/(_, _, <?(-)))■ Finally, by considering function skel(-), define a second term replacement 
£(■) as follows: 

. \t if t is a variable ,_ , 

m = • • • ■ *•) »«!.. atomic term and list-var( t ) = [xi,...,x„] ^ 



^It should be noticed that a similar term replacement was used in [T7] to eliminate function expressions 
from a logic program. However, the term replacement used in [17] considers only terms without nesting of 
function symbols. 
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For example, if t = f(x,h(z,y,g(x))), then = £/(_,fc(_,_, fl (_)))(a:,z,2/,a;). 

Given an st-SO dependency a from a source schema S to a target schema T, next we 
show how to construct an unnested st-SO dependency a* from S to T such that a and a* 
are equivalent. Let t±, . . ., tg be the non-atomic terms t such that there exists an atomic 
formula mentioned in a of the form either t = t' or t' = t or R(t\, . . . , U-i, t, ij+i, . . . ,tk) 
(where k is the arity of R and i € {1, . . . , fc}), let H(cr) = {t±, . . . , tg}, and let ST(cr) be the 
set of all non-atomic sub-terms of t±, . . ., tg, that is, ST(cr) = Ui=i non-atomic (tj). Then 
define S(cr) as the following set of function symbols: 

H(ff) = {4kel(t) | t G cST(a)}. 

For example, if cr is the following st-SO dependency: 

3f3g [VxVy y) -> T(x, /(*, g(y)),g(f(y, g(x))))], 

then 'H(cr) is the set {f{x,g(y)),g(f(y,g(x)))}. Thus, given that non-atomic(/(x, g(y))) = 
{f(x,g(y)),g(y)} and non-atomic(5(/(y,#(a;)))) = {g(f(y,g(x))),f(y,g(x)),g(x)}, we have 
that 5T(<r) = {f{x,g(y)),g{y),g{f{y,g(x))),f (y,g{x)),g(x)}, and 

S ( CT ) = {£/(-,<?(-))> £?(-)> £<?(/(-,s(-)))}> 

Note that two members of ST(cr), namely and g(x), have the same skeleton g(_), as 
do f(x,g(y)) and f{y,g(x)), which have the same skeleton /(_,</(_)). The set E(er) plays a 
fundamental role in the definition of the st-SO dependency a*. More precisely, assume that 
= {xi,X2, ■ ■ ■ ,Xm}- Then a* is defined as: 

3xi3x2---3xm^, 

where ^ is defined as the conjunction of the following dependencies. For every conjunct 
Vx (if — > ip) of a, the st-SO dependency a* contains a conjunct \/x{(p' — > ip'), where (1) 
ip 1 is obtained from (p by replacing every non-atomic term t € ST(cr) by £(t), and (2) ip' 
is obtained from ip by replacing every non-atomic term t € ST(cr) by Furthermore, 
for every pair of (non-necessarily distinct) terms t,t' 6 ST {a), if t = f(t±, . . . ,t n ) and 
t' = /(tj, • • • ,t^), the following procedure is executed to obtain a set F t> t> of dependencies, 
and then f\ T t t i is included as a conjunct of cr*. First, replace each occurrence of a variable 
in t by a fresh variable to obtain a term s = /(si, . . . , s n ), and replace each occurrence of 
a variable in t' by a fresh variable to obtain a term s' = /(s' l5 . . . , s' n ) (in particular, s and 
s' have no variables in common). Assume that the set of variables mentioned 

in s, s'. Second, as in the proof of Proposition E3J let dom(-) be a formula that defines the 
domain of the instances of S. Finally, let Tf t i be a set of dependencies obtained from the 
dependency: 



(7.4) 



Vxi • • • Vx p 

by repeatedly using the equivalences: 

((a V 0) A 7) ->• 6 = ((a A 7) £) A((/3A 7 ) -+ 5), (7.5) 
(3x a) — > (3 = Vx (a — > (3) if x is not mentioned in /?, (7-6) 

until all the disjunctions and existential quantifications in the left-hand side of (|7.4p have 
been eliminated. 
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Example 7.6. Let us give the intuition behind the definition of er* through an example. 
Assume that a is the following st-SO dependency: 



Vx A(x) -> (T(x, f(g(x))) A f(x) = g(x)) A Vs B(x) -> U(x, g(f(x))) 



.(7.7) 



Then we have that ST{a) = {f(g(x)),g{x),g{f(x)),f(x)} and E(er) = {£/( s g), £ g g, 
£/g}- Intuitively, £^g and £ fl g are used to represent functions / and 5, respectively, and 
an d £g(/(-)) are use d to represent the composition functions (<? o /) and (fog), 
respectively, thus eliminating the nesting of functions from a. More precisely, the st-SO 
dependency a* is defined as: 

3 ^/(9(-)) 3 ^(-) 3 ^(/(-)) 3 ^/(-)^' 

where is defined as the conjunction of the following dependencies. First, given that 
Vx (A(x) — > (T(x, f(g(x)))Af(x) = g(x))) and Vx (B(x) — > U(x,g(f(x)))) are the conjuncts 
of it, the following dependencies are conjuncts of ^f: 

Vx (T(x,£ /(ff g } (x)) Ae /g (x) = Z 9 (_)(x))), 

\/x(B(x)^U(x,C g(f{ _ )) (x))). 

Furthermore, for every pair t,t' of (non-necessarily distinct) terms from ST(cr), if either 
t = f(t\) and t' = f(ti) or t = g(ti) and t' = g(t'i), then the following conjuncts are included 
in ty. Assume that t = t' = f(g(x)). First, each occurrence of a variable in these terms is 
replaced by a fresh variable, generating the terms s = f(g(u)) and s' = f(g(v)). Second, 
given that the source schema consists of the unary predicates A and B, formula dom(x) is 
defined as A(x) V B(x) (that is, dom(x) holds if x is in the domain of a source instance). 
Finally, assuming that s\ = g(u) and = g(v), let a be the following dependency: 



VuVu 



dom(u) A dom(v) A £(si) = £(s[) ->■ £(s) = £(s') 



that is, a is: 



VuVu 



V B(u)) A V B(v)) A ^g(n) = £ ff g (v) -»• £/( ff g)H = £/( ff (_))M 



Then the set ^ consists of the following dependencies: 





A[u) A 


A ^(-)(^) 


= £?(-) M - 




= £/(<?(-)) ( U ) 




A(it) A 


A ^(-)(^) 


= £?(-) M " 




= tf(9(-))( V ) 




J3(u) A 


A ^(-)(«) 


= ^(-)W - 


+ ?/(9(-))( U ) 


= Zf(g(-))( v ) 


Vu\/v 


B(u)AB(v 


AC s gH 


= ( u ) " 


£/(</(-)) H 





and each one of these four dependencies is a conjunct of a* . It is important to notice that 
these dependencies make explicit some properties that are implicit in a. Given that / and 
g are function symbols in a, we know that if g(u) = g(v), then f(g(u)) = f(g(v)). But 
this property does not immediately hold for £ /( 9 g) and £ ff g and, thus, we have to include 
the above four conjuncts into a* to enforce it. It should also be noticed that the formula 
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dom(-) is included in the previous dependencies to satisfy the safety condition of st-SO 
dependencies, namely that every variable mentioned in a term has to be mentioned in a 
source predicate. 

To give more intuition about the definition of dependency a*, we also consider the case 
t = f(g(x)) and t' = f(x). As above, we start by replacing each occurrence of a variable 
in these terms by a fresh variable, generating the terms s = f(s±) and s' = /(s^), where 
si = g(u) and = v. Then we define a dependency /3 as: 



VuVw 



(A(u)\/B(u))A(A(v)\/B(v))AZ 9 (_)(.u) = v £/G,(_)) («) = £/(_)(«) 



and, therefore, in this case the set I\ t' consists of the following dependencies: 





A(u) A A(v) 




= v - 




= £/(-) W 


VitW 


A{u) A B(v) 


A £ 9 (-)( u ) 


= v - 


* £/(9(-))( M ) 




VuVv 


B(u) A A(v) 




= v - 




= £/(-) M 




B(u)AB(v 


) A^(_)(«) 


= v - 




= £/(-) ( U ) 



As in the previous case, each one of the dependencies of Ttfi is a conjunct of er*. It is 
important to notice that these dependencies make explicit the fact that in a, if g(u) = v, 
then f(g(u)) = f{v). 

For the st-SO dependency (17, 7h . we took dom(ti) to be A(u) V B(u). We then made use 
of (|7.5p to eliminate the disjunction in \/B(u). If the left-hand side of the first conjunct 
of (|7.7p had been P(x,y) instead of we would have taken dom(u) to be 3wP(u,w) V 

V B(u). We then would have made use not only of (|7.5p . but also (|7.6p . to 
eliminate the disjunctions and existential quantifiers in 3wP(u, w) V 3wP(w,u) V B(u). 

□ 

We now prove that a <^ a*, that is, that c and a* are equivalent. 

(=>■) If (I, J) \= a, then it is straightforward to prove that (I, J) \= a* (the interpreta- 
tion of each function symbol in H(cr) is defined from the corresponding composition of the 
interpretations of the function symbols from a). 

(<=) Assume that S(cr) = {xi, • • • , Xm} and that (I, J) \= a* with the instantiations Xii 
. . ., Xm °f Xlj ■ ■ - j Xm- Moreover, assume that /i, . . ., are the function symbols mentioned 
in ex. To show that (/, J) |= a, we first need to define from Xi, ■ ■ •> Xm the instantiations 
/i, of function symbols /i, . . ., and then we have to show that (/, J) satisfies all 

the conjuncts of a with these instantiations. 

Given that (/, J) |= a*, we have by definition of satisfaction for st-SO dependencies that 
there exists a countably infinite univers $U such that (1) U is the union of dom(/)Udom(J) 
and a set of nulls, and (2) (U ; /, J) satisfies a* in the standard second-order logic sense. 
Assume that _L is a fresh null value (_L U) and that the arity of function symbol fi is 
ki (1 < i < Then the domain of each one of the functions f®, . . ., /? is defined to be 
[/ U {_L}, and for every (oi, . . . , a^) € (U U {_L}) fci , we define /°(ai, . . . , a^) as follows. If 
there exist /j(ti, . . . , ifcj € ST (a) and tuples &i, . . ., 6^ such that for every i € {!,...,&,}: 



2 As noted earlier, the universe can even be taken to be finite, but we do not need this. 
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• t{ is a variable, bi = (a.j) and a% G dom(7); or 

• t i is a non-atomic term, Oj = Cgke\(t an< ^ ^ — dom(I) (that is, every element mentioned 
in bi is in dom(/)); 

then ff{ai, ...,a ki ) is defined as ^(Afti,...,**.))^ 1 ' ■ ■ ■ A*)- Otherwise, /°(ai, . . . ,a fci ) is 
defined as _L (in particular, if Oj = _L for some i G {1, fcj}, then /°(ai, • • • , a*-) = -L). 

Before showing that all the conjuncts of <r are satisfied by (J, J) under the instantiations 
f®, . . ., /? of function symbols /i, . . ., /g, we need to show that these functions are well 
defined. That is, we have to show that if by using the above definition, one has different 
ways of assigning a value to ff{a\, . . . ,a ki ), then all these ways assign the same value to 
fi(a±, . . . , a ki ). In order to prove this, we need to consider several cases. In this proof, 
we consider only one of these the other ones can be handled in the same way. 

Assume that for i G {1, . . . ,£} and elements 01, . . ., a ki from U U {-L}, it holds that (1) 
fi{h, ■ ■ ■ , t ki ) G ST {a), (2) Oj = £skel(ti)(^) and h Q dom(I), for every i G {1, . . . , fej, (3) 
fi(si, . . . , s^) G ST (a), and (4) a\ = ^ kel(s .)(cj) and C dom(J), for every i G {1, . . . , fe}. 
Then we have to prove that: 

^dsel(/ i (ti l ...,tfc.))(^l'"-^*i) = ^skel(/i(si,..., Sfci ))( C l'---' 5 fei)- ( 7 ' 8 ) 

Given a tuple S = (a?i, . . . , x p ) of variables, let dom(x) be a shorthand for dom(xi) A • • • A 
dom(xp). By definition of a* and the fact that (I, J) \= a* , we have that (I, J) satisfies the 
following instantiated dependency: 

y\ dom(^) ) A f /\ dom(ci) J A f /\ £ s kel(t;)(>i) = £skel( Si )(ci) J -> 
i=l ' M=l ' M=l ' 

iskA{f i {t 1 ,...,t k .))(pU ■ ■ -,hi) = Cskel(/ i ( Sl ,..., Sfc .))(ci, • • • ,C fc J. 
Thus, we conclude that (|7.8j) holds, since for every i G {1, . . . , k{\, it holds that Cskelfe)(^) = 

a * = £skel( Si )(^)' ^ - dom (^) and Cj C dom(J). 

Now we move to the proof that all the conjuncts of a are satisfied by (I, J) under 
the instantiations f®, . . ., /? of function symbols /1, . . ., fy. In this proof, we need the 
following claim, where we use the following terminology. Given a non-atomic term t = 
fi(t\,. . . ,t ki ) based on variables x\, . . ., x k and function symbols fi, . . ., fy, and given a 
variable substitution p : {xi, . . . , x k } — >• (U U {-L}), the evaluation of p over t is recursively 
defined as p(t) = f?(p(h), p(t kl )). 

Claim 7.7. Let t G ST(o~) such that list-var(i) = [x\, . . . ,x k ]. Then for every variable 
substitution p : {x\, . . . ,x k } — > dom(7), it holds that p(t) = Cskci(t)('°( Xl )' ■ ■ ■ iP( x k))- 

Proof. By induction on the depth of nesting of functions in t. 

• Base case: If the depth of nesting of functions in t is 1, then t = fi(xi, . . . , x k ) and k = k{. 
Then we have that p(fi(x\, . . . , x k )) = ff{p{xi),...,p(x k )). But given that p(xj) G 
dom(I) for every j G {1, . . . , k}, we have by definition of ff that ff{p(x\), . . . , p(x k )) = 
tstel(fi{xi,...,x k ))(p( x i)> ■ ■ -iP( x k))- Th us, we conclude that p(t) = ^ kcKt) (p(xi), ■ ■ -,p{x k )). 

• Inductive step: Assume that the depth of nesting of functions in t is p, and that the 
property holds for every term with depth of nesting of functions smaller than p. In this 
case, we have that t = fi(ti, . . . , t ki ). Thus, we have that p(t) = ff{p(t\),...,p(t k .)). 
If U is a variable, then we have that p(ti) G dom(I) since p : {x\, . . . , x k } — > dom(J). 
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On the other hand, if U is a non-atomic term such that list-var(tj) = [ui,. . . ,u q ] (with 
[u±, . . . ,u q ] a sub-list of [x\, . . . , x k ], that is, [ux, . . . , u q ] consisting of consecutive elements 
of [x±, . . . ,x k ]), then given that the depth of nesting of functions in ti is smaller than 
p, we have by induction hypothesis that p{ti) = £skoi(t )(/°( M i)> • • • > P( u q))- Thus, given 
that p : {xi, . . . ,x k } — > dom(I), we have by definition of ff that f®(p(t\), . . . , p(t k .)) = 
4°kei(/ i (f 1 ,...,i fci ))(/ , ( x i)'---'/'( a; fc)) and, therefore, p(t) = £ s ° kcl(t) {p{xx ),..., p{x k )). This 
concludes the proof of the claim. □ 

We finally have all the necessary ingredients to prove that (/, J) \= a. More precisely, we 
show next that all the conjuncts of a are satisfied by (/, J) under the instantiations f®, . . ., 
ff of function symbols fx, . . ., fg. Let 

Vzi • • • Vx fe Vyi • • • Vy m (<p(xi,. . .,x k ,yx,.. . ,y m ) ->• ip(xx,. . . ,x k )) (7.9) 

be a conjunct of a, and let p be a variable substitution with domain {x±, . . . , x k , yi, ■ ■ ■ , y m } 
and range contained in dom(I), and assume that I (= ip(p(x\), . . . , p(x k ), p(yi), ■ ■ ■ , p{ym)) 
with the instantiations . . ., J®. Next we show that J \= ip(p(xi), . . . ,p(xk))- Assume 
that 

Vxi • • • VxfcVyi • ■ ■ Vy m (^'(a^i, ...,x k ,y 1 ,.. .,y m ) ->• ip'(xi, . . .,x k )) 

is the conjunct of a* obtained from (|7.9p by replacing every non-atomic term t G ST(cr) 
by £(i). Then given that the range of p is contained in dom(/), we have by Claim P7TT1 and 
definition of a* that / |= tp'(p(xi), . . . , p(x k ), p(yi), ■ ■ ■ , p(y m )) with the instantiations x? 5 
. . ., Xm ^ the function symbols xi, ■ ■ Xm (recall that H(<r) = {xi, ■ ■ ■ ,Xm})- Thus, we 
conclude that J \= tp'(p(xi), . . . , p(x k )) with the instantiations x? ; • • Xm ( as (I, J) satisfies 
the conjuncts of a* with these instantiations). Therefore, again by Claim \77f\ and definition 
of a*, we have that J \= ip{p{xi), . . . ,p(x k )) with the instantiations J®, . . ., which was 
to be shown. This concludes the proof of the theorem. □ 

We now move to the proof of Theorem 17.21 The following lemma will be used in this proof. 

Lemma 7.3. For every SO tgd a of nesting depth 2, there exists an unnested SO tgd a* 
that is equivalent to a . 

Proof. In this proof, we extensively used the terminology defined in the proof of Theorem 
17.11 Besides, we say that a term t is an i-term if the depth of nesting of function symbols 
in t is i. For example, f(x,y) is a 1-term while g(f(x,y),z) is a 2-term. 

Let a be an SO tgd from a source schema S to a target schema T. Assume that the 
depth of nesting of function symbols in every term mentioned in a is at most 2, and that 
fx, • • •, ft are the function symbols mentioned in a. Then define a set O of dependencies as 
follows. For every conjunct a of a, we include the following dependencies as elements of O. 
Assume that tx, ■ ■ ■, t m are the 1-terms t for which there exists a 2-term t' mentioned in a 
such that t G non- atomic (t r ). For example, if a is the following dependency: 

VxVy (S(x, y) A f(x) = g(f(x),f(y)) T(f(g(x, x)))), (7.10) 

then f(x), f(y) and g(x,x) are the only 1-terms satisfying the preceding condition. Fur- 
thermore, for every i G {l,...,m}, define Tj as the set {x, fx(xx), ■ ■ ■ , ft(,%i)} of terms, 
where: (1) x is a variable, (2) each Xj (1 < j < £) is a tuple of pairwise distinct variables, 
(3) for every j G {1, ...,£}, we have that x is not mentioned in Xj, and (4) for every pair j, 
k of distinct values in {1, . . . ,£}, we have that xj and x k do not have variables in common. 
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Besides, assume that for every pair i, j of distinct values in {1, . . . , m}, we have that Tj 
and Tj do not have variables in common, and for every i G {1, . . . , m}, we have that Tj and 
a do not have variables in common. For example, assuming that t\ = f(x), t 2 = f(y) and 
t 3 = g(x,x) for the case of conjunct (|7.10p . we have that: 

T\ = {ui,f{u 2 ),g(u 3 ,u 4 )}, 

T 2 = {u 5 ,f(u e ),g(u 7 ,u 8 )}, 

T 3 = {u$,f(uio),g(uii,ui2)}, 

satisfy the preceding conditions. 

Assume that a is Vx (ip — > ip). Then for every si G Tx, . . ., s m G T m , define dependency 



a 1 j ■ ••i t) rn 



VxVt/i • • • Vy n 



where £ is defined as in the proof of Theorem 17.11 (see (I7.3P ) , yi , . . . , y n are the variables 
mentioned in the terms s±, . . ., s m , and <p', ip' are obtained from ip and V> 5 respectively, by 
replacing as follows the 1-terms and 2-terms of these formulas. Every 1-term t mentioned in 
tp (resp. ip) is replaced by in ip' (resp. ip'). Furthermore, every 2-term t = /(t^, . . . ,t' p ) 
mentioned in ip (resp. ip) is replaced by £(/(£'/, . . . , tp)) in ip' (resp. ip'), where t'[ (1 < i < p) 
is defined as follows. If t\ is a 1-term, and so t\ is tj for some j in {1, ... , m}, then let be 
Sj. If ^ is a variable u, then let t'( be v. 

Finally, for each dependency # Sl) ... )Sm , let Sl) ... jSm be a set of dependencies obtained 
from 9 sl ,..., Sm by repeatedly using the equivalences: 

((a\/P)Aj)^5 = ((a A 7) -> 5) A ((/3 A 7) — >■ <5), 

(3xa) — ?> /3 = Vx (a — > /3) if x is not mentioned in /3, 

until all the disjunctions and existential quantifications in the left-hand side of 9 S1 ... Sm have 
been eliminated. Then all the dependencies in S1 ... Sm are included in 0. 

Example 7.8. Let us give the intuition behind the definition of through an example. 
Assume that a is the following conjunct of SO tgd a: 

VxVy (S(x, y) A f(x) = g(f(x),f(y)) T(f(g(x, x)))). 

Then, as mentioned above, we have that t\ = f(x), t 2 = f(y), t 3 = g(x,x) are the 1-terms 
t for which there exists a 2-term t' in a such that t G non-atomic(i'). Furthermore, as also 
mentioned above, we can assume that: 

Ti = {ui,f{u 2 ),g(u 3 ,u 4 )}, 

T 2 = {u 5 ,f(u e ),g(u 7 ,u 8 )}, 

T 3 = {«9,/(«lo))ff(wil,Wl2)}- 

Then for every si G T\, s 2 G T 2 and S3 G T 3 , we have to compute the formula 9 Sl)S2:S3 , and 
then to include all the dependencies of Sl! <j 2iS3 as elements of 0. Assume that s± = m, 
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S2 = g(u 7 ,u 8 ) and S3 = f(u\o). Then Sl ,s 2 ,s 3 is the following dependency: 



VxVyVtiiVuyVusVuio 



S(x,y) A dom(iii) A dom(u 7 ) A dom(us) A donatio ) A 



£f(-)( x ) = u i A ^/(.)(y) = £<?(_,_) («7,«s) A^.)(x,s) =£/(_) («io) A 



= £?(_,<?(_,_)) ("1, ^7, "s) ) -> T (£/(/(.))(^lo)) 

We note that equalities = «i, £/y(y) = £</(_,_) ( u 7, «s) and = £/(_)(«io) 

correspond to £(ii) = £(£2) = £(^2) and £(£3) = £(53) in the definition of dependency 

9 Sl ^ 2! s. i , respectively. Furthermore, we note that equality CfL)( x ) = ^g(.,g(.,.))( u i^ n 7> n s) 
is generated as follows from equality f(x) = g(f(x),f(y)) in a. The 1-term f{x) in the 
left-hand side of this equality is replaced by £(f(x)) = Cf(_)( x ) m ^si,s 2 ,s 3 ) and the 2- 
term g{f{x),f(y)) is replaced by £(g(ui, g(u 7 ,u 8 ))) = £<?(_,<?(_,_)) «7, "8 ) in 6» SliS2jS3 (since 
ti = /(a;), si = ni, t 2 = f(y) and s 2 = g(u 7 ,u 8 )). Finally, we note that T(£/(/(_))(uio)) 
is obtained by replacing the 2-term f(g(x,x)) by £(/(/(«lo))) = £/(/(_)) («lo) (since t 3 = 
and s 3 = f(u 10 )). 

Given that dom(x) = 3uS(x, u) V 3vS(v, x) in this case, we have that the following 
dependency is one of the elements of & Sl ,s 2 ,s 3 ' 

VxVyVuiVnyVngVnioVziVzyVzgV^io ( S(x, y) A S(ui, z\) A S(z 7 , u 7 ) A S(z 8 , u 8 ) A 
S(uio, z\o) A ^f(_)(x) = ui A£f(_)(y) = £ g (_,_)(u 7 ,u 8 ) A£ g (_ t _)(x,x) = £/(_)(«io) A 

f/(-)( JC )=^UW)( u i. u r,«8)) -> r^ /(/( . )) («io)) 



As in the proof of Theorem 17. 11 function symbols £/(_), are used to represent functions 

/ and g, respectively, and , £/(/(_)) are used to represent functions g(x,g(y,z)) and 

f(f(x)), respectively, thus eliminating the nesting of functions from a. It is important to 
notice that the preceding dependency makes explicit some properties that are implicit in a. 
Given that / and g are function symbols in a, we know that if f(x) = u\, f(y) = g(u 7 ,u 8 ), 
g(x, x) = /(uio), f(x) = g{f{x), f(y)) and T(f(g(x, x))) hold, then f(x) = g(ui,g(u 7 , u 8 )) 
and T(/(/(ztio))) also hold. But this property is not immediately true for £ g (_ t gf)) and 
£/(/(_))) and, thus, we have to include the preceding dependency in 6 to enforce it. □ 

Assume that xi> • • •> Xk are the function symbols mentioned in O. Then unnested SO 
tgd cr* is defined as: 

3xi---3xfe(A )- 
Next we show that a and <r* are equivalent. 

(=>■) If (I, J) \= a, then it is straightforward to prove that (I, J) (= cr* (the interpretation 
of each function symbol mentioned in G is defined from the corresponding composition of 
the interpretations of the function symbols from a). 

(-4=) Assume that (I, J) |= a* with the instantiations Xv ■ ■ -j Xk of the function symbols 
Xi, ■ ■ ■, Xk- To show that (I, J) \= a, we first need to define from x?, • • •> Xk the instanti- 
ations fi, . . ., of the function symbols fx, . . ., and then we have to show that (/, J) 
satisfies all the conjuncts of a with these instantiations. 
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Let FT {I) be the set of all pairs (fi, a) such that (1) i G {1, . . . ,£}, (2) a is a tuple of 
elements from dom(J), and (3) the length of a is the same as the arity of function symbol 
fi. Furthermore, let < be an arbitrary linear order over FT (I), and define T>T{J) as the 
set of elements o G dom(J) such that a G dom(I) or a = , Ja) for some i G {1, ■■■,£} 
and tuple a of elements from dom(J). The sets FT (I) and VT(J) are used to define a 
substitution k, which in turn is used to define the interpretations of function symbols fx, 
. . ., fg. More precisely, for every a G VT(J), define: 



n(a) 



(_, a) if a G dom(I) 

(/,(_, ...,_), b) if a dom(/) and (f h b) = min{(/ fc , c) G FT(I) \ $ k (.,...,.) (c) = a} 



Note that in the second case in the definition of ft(a), the set over which the min is taken 
is nonempty, because a G UT(J) and a G" dom(I). 

Define the instantiations . . ., of function symbols /i, . . ., /g as follows. Given that 
(I, J) |= a* , we have by definition of satisfaction for SO tgds that there exists a countably 
infinite universal U such that (1) U is the union of dom(7) U dom(J) and a set of nulls, 
and (2) (U; I, J) satisfies a* in the standard second-order logic sense. Assume that _L is a 
fresh null value (_!_ G" U) and that the arity of function symbol fi is k% (1 < i < £). Then 
the domain of each one of the functions /°, . . ., is defined to be U U {-L}, and for every 
(ai,...,a fe J G (U U{±}) fcl , we define: 

{£/i(>i,...,™ fc .)(ki' • • ■ '^.) if for ever y *£{!,••■ j^i}, it holds that 
at G VT{J) and «(di) = 
_L otherwise 

Notice that if a\ = _L in the definition above, for some i G {1, . . . , fcj}, then (oi, . . . , a^) = 
_L. 

Next we show that (I, J) satisfies^ every conjunct of cr with the instantiations . . ., 
f9 of function symbols /i, . . ., But before doing this, we give an example that shows 
how the strategy of the proof works. 

Example 7.9. Consider again conjunct (I7.10|) , To prove that (I, J) satisfies this conjunct 
under the preceding definition of the functions in a, we have to prove that if: 

/ h S(a,b)Af(a)=g(f(a),f(b)), 

then J \= T(f(g(a,a))). In order to prove this, we first need to figure out what the values 
of f°(a), f°(b), g°(f (a),f°(b)) and f°(g°(a,a)) are. Given that a, b G dom(J), we have 
that f°(a) = £° ( } (a) and f°(b) = £° ( Jb). The definition of g° (f°(a) , f° (b)) depends on 

whether £^ s(a) and ^(6) belong to dom(J). Assume that £^ J a) = a\ and £^ ^(6) = 6i, 
where a\ G dom(/) and 6i dom(I). Then by the preceding definition, we have to compute 
Av(ai) and n(b\) in order to compute the value of g (/° (a) , / '(b)) . We have that n(a±) = 
(_, ai) since ai G dom(I), and we assume for this example that n(b±) = (g(~, -), (c\, C2)), 
where ci,C2 G dom(J). That is, we assume that \(c±,C2) = b\ and (g, (01,02)) is the 



^Again, the universe can even be taken to be finite, but we do not need this. 

4 When we say that (7, J) satisfies a formula, we mean that (U ; I, J) satisfies the formula. Similarly, when 
we say that I satisfies a formula, we mean that (U; I) satisfies the formula, and likewise for J satisfying a 
formula. 
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smallest element (h, d) in J-T(I), according to the linear order <, satisfying the condition 
£h( )(^) = Thus, by the preceding definition, we have that: 

9°(f (a),f°(b)) = sVA) = ^(_, sU _))Kci,c 2 ). 

Finally, we also need to know what the value of f°(g°(a, a)) is. By the preceding definition, 
we know that g°(a,a) = Ja,a). Assume that \{a,a) = d\ with d\ dom(J). 

Then, as in the previous case, we need to compute ft(di) in order to compute f°(g°(a,a)). 
Assume for this example that n{di) = (/(-), cfe)) where c?2 E dom(7). That is, assume that 
£/(.)(^2) = di and (/, cfo) is the smallest element (h,d) in J 7 T(I), according to the linear 
order <, satisfying the condition £^ ^(d) = d\. Thus, by the preceding definition, we 
have that: 

f°(g°(a,a)) = fid,) = ^ {/{ .))^)- (7-11) 

Therefore, from the previous discussion and the fact that that I |= /(a) = g(f(a), f(b)), we 
conclude that: 

I \= S(a, b) A dom(ai) A dom(ci) A dom(c2) A dom(d2) A £/(_)( a ) = a\ A 

£/(-)0) = £<?(_,_) 0i,c 2 ) A^(o,o) = f/(.)(tfe) A£/(_)(a) = £<?(_,?(_,_)) («i , ci, c 2 ). 

Hence, given that we assume that (7, J) |= <r* and the following dependency is one of the 
formulas si ,s 2 ,s 3 (see Example 17. 8ft : 

VxVyViiiVuyVusVulo (s(x, y) A dom(ni) A dom(ii7) A dom(us) A dom(nio) A 
£f(.)( x ) = u l A^/(.)(y) = &,(_,_) (u 7 ,u 8 ) A (a;, a;) =C/(.)(«lo) A 

f/(-)( a: )=^UW)( u i»«r,«8)) -> ^/(/(.))(«io)) 



we conclude that J |= r(C/(/(_))(^2))- But we know from (|7.1ip that f°(g°(a,a)) = 
£/(/( ))(^2) and, thus, we have that J |= T(f(g(a,a))), which was to be shown. □ 

In general, we have to show that if Val (<p(x) —> ip(x)) is a conjunct of a and I \= <p(a) 
with the instantiations f®, . . ., /? of the function symbols f±, . . ., fg, then J |= ip(a) with 
these instantiations. It is straightforward but lengthy to generalize the strategy shown in 
the previous example to this case. In particular, given that in the construction of <7* we 
consider all the possible cases for substitution k, the previous strategy can be applied in 
general. This concludes the proof of the lemma. □ 



Proof of Theorem 7.2. The theorem is proved by induction on the nesting depth n of an 
SO tgd. If n = 1, then the property trivially holds, and if n = 2, then the property holds 
by Lemma 17.31 Thus, let a be an SO tgd from a source schema S to a target schema T, 
and assume that the nesting depth of a is n > 3. Moreover, assume that the theorem holds 
for every SO tgd a' of nesting depth n' < n. 

By Theorem 8.4 in [23], we know that there exist schema mappings Mi, M2, M3, ■ ■ ■, 
M.n+1 such that a specifies MioM2°Mz°- • -oM n +i and every mapping Mi (1 < i < n+1) 
is specified by a set of st-tgds. For every i G {1, . . . , n + 1}, let <7j be an unnested SO tgd 
that specifies Mi- We know, by the definition of the algorithm Compose in |23j . that there 
exists an SO tgd o\i that specifies the composition of the schema mappings specified by o\ 
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and <72 and whose nesting depth is at most 2. By Lemma 17.31 we have that there exists an 
unnested SO tgd a* 2 that is equivalent to a\i- Thus, by considering again the definition 
of the algorithm Compose in [23], we have that there exists an SO tgd 013 such that 013 
specifies the composition of the schema mappings specified by a\ 2 and 03 and whose nesting 
depth is at most 2. Hence, by considering again Lemma 17, 31 we conclude that there exists 
an unnested SO tgd 0* 3 that specifies the composition of the schema mappings specified 
by a* 2 and 03 and, thus, also specifies M.\ o M.2 o M3. Finally, by considering again the 
definition of algorithm Compose in [23], we have that there exists an SO tgd a' such that: 
(1) a' specifies the composition of the mappings specified by 0J3, 04, n +i; and (2) 
the depth of nesting of a' is at most n — 1 (since 0*3, 04, . . ., a n+ \ are all unnested SO 
tgds). Therefore, we conclude that there exists an SO tgd o' that is equivalent to and 
whose depth of nesting is at most n — 1. But then by induction hypothesis, there exists an 
unnested SO tgd <r* that is equivalent to a' and, hence, there exists an unnested SO tgd 0* 
that is equivalent to a. This concludes the proof of the theorem. □ 

8. Concluding Remarks 

We have investigated the question of what language is needed to specify the composition of 
schema mappings with target constraints. In particular, we showed that st-SO dependencies 
(along with appropriate target constraints) are exactly the right language for specifying the 
composition of standard schema mappings (those specified by st-tgds, target egds, and 
a weakly acyclic set of target tgds). By contrast, we showed that SO tgds, even with 
arbitrary source and target constraints, are not rich enough to be able to specify in general 
the composition of two standard schema mappings. In addition to their expressive power, 
we also showed that st-SO dependencies enjoy other desirable properties. In particular, 
they have a polynomial-time chase that generates a universal solution, which can be used 
to find the certain answers to unions of conjunctive queries in polynomial time. 

We proved the surprising result that SO tgds and st-SO dependencies can be denested: 
that is, each such dependency is equivalent to another dependency of that type with no 
nested function symbols. These denesting results can be used to "collapse" multiple com- 
positions of schema mappings into the composition of two schema mappings of that type. 
In particular, we obtain the unexpected result that the composition of an arbitrary number 
of st-tgd mappings is equivalent to the composition of only two st-tgd mappings. 

Our results gave us two ways to "simplify" the composition of an arbitrary number of st- 
tgd mappings. First, we could replace the composition by a single schema mapping, specified 
by an unnested SO tgd. Second, we could replace the composition by the composition of 
only two st-tgd schema mappings. A similar comment applies to the composition of an 
arbitrary number of standard schema mappings. 
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